Conclusion

evidence from different randomised trials
There exists a strong need for integration of the knowledge that has been acquired through different studies. The meta-analysis tries to provide this information and has been applied quite successfully. It is primarily being used to increase the precision of the main outcome measure of studies. Major problems with the method are the handling of differences between the studies and the fact that the primary outcome is not necessarily identical with the value of greatest interest for decision making. The joint analysis with Miscan as described in chapter 2 can be seen as an alternative method for the meta-analysis that can overcome some of the problems with joint analyses.
The meta-analysis can handle differences between studies by excluding studies that are not in agreement with a more or less narrow definition, thus limiting the meta-analysis to the most similar studies. This practice goes at the expense of the power of the analysis to increase precision. It also introduces the possibility of bias by assuming a definition of trials that are to be included that coincides with trials with an on average higher or lower outcome than the excluded trials. Besides that, the main outcome of a trial on cancer screening is typically a relative risk of dying from the disease in question in the screened/invited arm versus the control arm during the course of the trial. For decision making, reduction of mortality from the cancer and life years gained, both in the target population instead of the trial population, are more interesting values.
The Miscan model for breast cancer has been applied to analyse the outcomes of the Swedish breast screening trials while accounting for several characteristics of the different trials to estimate the model parameter 'improvement of prognosis' that represents the (tumour size dependent) probability to prevent breast cancer death due to detection by screening. This parameter is then used to estimate mortality reduction and life years gained in the target population for which screening is being considered. Thus, the Miscan approach tries to solve both the problem of joining data gathered under different circumstances and the problem of extrapolating trial results to a situation of decision making. We currently plan to do a similar joint analysis of trials for colorectal cancer screening.
A clear disadvantage of this Miscan approach is that relative to the method of meta-analysis, the joint analysis with Miscan clearly lacks statistical validity. The problem that precludes statistical validity of estimates by Miscan results from the amount of assumptions that are necessary to construct the model and of which the uncertainty is not known well enough. The first attempt to estimate uncertainty of a Miscan model did not resolve all problems. Due to the nature of the unresolved problems, it cannot be expected that they will be resolved fully acceptably in foreseeable future. (see also further in this chapter under 'evidence and uncertainty').
From the perspective of decision making there is the choice between extrapolating the statistically valid results from a meta-analysis to the situation concerning the decision to be made, either by applying a formal model or by a more informal method, and extrapolating from individual studies by a joint model analysis to the decision situation. In other words: statistical validity is inevitably lost due to the extrapolation, whether this extrapolation is preceded by a meta-analysis or whether the gathering of evidence is included into a joint model analysis.
While the problem of loss of statistical validity remains, there is need for both the practice of meta-analysis in order to meet requirements concerning evidence based medicine, and for further assessment of the technology in question based on a joint model analysis.

case control studies into efficacy of screening
Observational studies can be used as alternative for randomised trials on efficacy of screening. For this purpose, the case control design has been adapted for screening evaluation. However, a widely recognised problem is the self-selection bias due to a likely risk difference between those who tend to undergo screening and those who do not. Originally the case control study is used for reasons of efficiency because it only samples a small fraction of the vast majority of individuals who do not have the disease. The adapted version for estimating screening efficacy tries to avoid the bias that arises from the fact that individuals who die from the disease have a period from diagnosis in which they are not screened. Chapter 3 shows that this bias is not sufficiently resolved by the case control methodology as it is being used. Further development of this methodology will probably be able to further diminish this healthy screenee bias, but it will not be able to prevent self-selection bias. This precludes the outcomes from case control studies on screening efficacy to form a sufficient evidence base for deciding to start a screening programme. After efficacy of screening has been demonstrated by randomised trials, there may be a role for case control studies to support additional technology assessment, such as the evaluation of screening in a small age range within the potential target population.

estimating net survival
Besides an estimate of the efficacy of screening, decision support requires several other parameters to be estimated that influence the effectiveness of a screening programme. Many of the relevant parameters can only be estimated by observational studies. Probably the most important of these values is net survival from the disease in question. There is no gold standard for estimating net survival, therefore an evaluation of bias in estimating net survival is limited to comparing different, but all potentially biased, methods. Chapter 4 shows that the most heavily debated issues in estimating net survival, such as the quality of registration of cause of death, are of limited consequence, at least in the examples of colorectal cancer and prostate cancer. Other issues, such as whether to limit the analysis to first cancers in a patient, are at least as important. In principle all estimated survival values used should be evaluated for bias.

evidence and uncertainty
Sensitivity analyses have been applied in several part of this thesis. In sensitivity analysis, the effects of changing assumptions on the outcomes are studied, usually by changing one model parameter at a time and thus studying the effects of uncertainty of one parameter at a time. Sometimes the effects are studied of changing more than one parameter at the same time in order to study the joint effect of uncertainty arising from several parameters and their interactions.
In complicated models such as the ones used in this thesis, it is not feasible to fully explore all interactions between variations in parameters. It is more feasible to perform an uncertainty analysis. In such an analysis, a probability distribution is assumed for each of the parameters that are to be subjected to sensitivity analysis. The probability distribution of a parameter represents uncertainty concerning that parameter. For simpler models it is possible to derive the probability distribution of an outcome measure from the distributions on assumptions. For more complicated models, the probability distribution of an outcome measure can be accurately estimated by sampling from the probability distributions of model parameters and evaluating the model for each sample. The rigour of the uncertainty analysis gives rise to several questions concerning uncertainty in modelling for decision support that are also applicable to any other form of sensitivity analysis.
There is a general recognition that decision makers should be provided with a notion on how uncertain expected effects of a decision are, even if the decision maker is not readily interested in such information. However, it is not clear what should be the role of uncertainty in decision making. While on the one hand it can be argued that a decision should be based on just the expected effects, on the other hand there is the argument that uncertainty is to be avoided. If uncertainty is to be avoided, the question rises: what negative value is to be attributed to uncertainty? In other words: if there is a choice between a decision for something with a certain cost-effectiveness then how much more favourable should the expected value of an alternative with less certain cost-effectiveness be? Sometimes the desire to avoid uncertainty is described as avoiding risk. However, in decision support where results concern the balance between costs and effectiveness, the question rises what risk is to be avoided.
As long as it is not sufficiently clear what actual decision criterion will be used, it is neither clear what uncertainty should be presented, that of the cost-effectiveness of one policy, the marginal cost-effectiveness of one screening policy relative to a slightly less intensive policy, the policy being Pareto optimal relative to all other possible policies or the policy to be preferred at a particular threshold for (marginal) cost-effectiveness?
In uncertainty analysis, all uncertainty on assumptions is described in the same format of a probability distribution. These uncertainties can however be of quite different quality. For instance in screening evaluation, there is uncertainty due to limited numbers of observations, to recent developments that cannot be observed in a situation with widespread screening so that they can only be estimated with an uncertain model on screening influence, to future changes in the epidemiology of the disease, to actual screening behaviour (attendance, intervals, follow-up), and to outcomes of future negotiations on costs. It is unclear as to how far these different kinds of uncertainties should be treated as being equivalent.

balancing favourable and unfavourable effects at high ages
Chapter 8 shows that while in general the balance between favourable and unfavourable health effects of breast cancer screening is good, when screening is applied in higher age groups, the unfavourable health effects may easily outweigh the favourable health effects. We have not estimated this balance for screening for other cancers. If early detection of prostate cancer has a similar effect as that of breast cancer, then the balance of favourable and unfavourable effects probably changes at earlier age. That is because the longer sojourn time of invasive prostate cancer, particularly at higher ages, results in detecting more cancers that would not have been diagnosed without screening, and in more life years with cancer. Cervical cancer and colorectal cancer have a relatively short sojourn time of invasive cancer, and a relatively long sojourn time of precursors of cancer by which invasive cancer can be prevented. This complicates the situation too much for extrapolation of our findings on breast cancer screening.

effectiveness and circumstances
Chapter 10 shows that specific local circumstances can influence cost-effectiveness of screening programmes. However, several Miscan evaluations of breast cancer screening in different countries did not result in widely different outcomes as to the preferred screening policy. For instance Spain has a much lower breast cancer risk than the Netherlands, warranting less intensive screening. Miscan modelling was applied in two regions of Spain, but these regions had a much higher breast cancer risk than the average for Spain. Apparently regions with the highest breast cancer risk tend to be the ones wanting to start a breast cancer screening programme so that preferred policies among those regions are very similar.
Among regions with a substantial screening effort for cervical cancer, the differences between screening policies are quite large and they are by no means justified by differences in local circumstances on cost-effectiveness. One may wonder if this unjustifiable divergence of screening policies is due to a lack of firmness of the evidence for the efficacy of cervical cancer screening. Moreover, the main regional differences in cervical cancer screening concern the higher risk in developing countries like India and Brazil compared to that in western countries. Particularly in high-risk regions hardly any screening takes place.

future developments
The effects of cancer screening depend on a rather complicated process of development of the cancer in question as well as on several circumstances in which the screening takes place. A quantitative assessment of cancer screening is therefore only possible with the aid of an integrative model. Miscan is an example of such a model. The statistical validity of estimates from such models is as yet not very satisfactory. However, the formal description of all assumptions and the mechanism by which estimates from the model are derived, is superior over alternative methods (Isaacs and Fitzgerald 1999) because in theory it opens the possibility to discuss and criticise all aspects of the assessment. In practice, the formal description of model estimates is apparently too intricate for appropriate critique and thus they are rather light-heartledly either firmly accepted or equally firmly rejected, largely depending on whether the estimate is in agreement with the personal preference based on informal estimates that are not liable to detailed critique. This practice may improve because the number of research groups that is seriously working on this type of modelling is increasing, giving more possibilities for competitions and mutual criticism. There are also signs for a stronger interaction between model development and establishing empirical evidence.
Our research group is currently developing a new model for evaluating breast cancer screening that is better able to explain improvement of prognosis due to early detection and that can better separate the natural course of the cancer, the behaviour of the woman with the cancer in response to sign from the disease, and the effects of a screening programme. This model will provide a better tool to study the influence of changes in the earliness of diagnosis outside screening on the effectiveness of screening and the possible effects of delay of diagnosis due to a previous negative screening. In breast cancer screening there is also need for more detailed analysis of the randomised trials, particularly with respect to the effect of screening women under age 50. Such analysis should of course include the Canadian NBSS trial.
In cervical screening the Miscan model can be used to estimate the natural history parameters of HPV infections that may cause cervical cancer. Though the causal relationship between HPV infection and cervical cancer has been firmly established by now, it is still not clear whether some form of HPV screening can be more efficient than the current pap test.
The Miscan model for prostate cancer screening will continue to be used as an aid to interpret data from the ongoing trials and population trends, and will be used to help estimate the balance of favourable and unfavourable health effects of screening.
The Miscan model for colorectal cancer screening will be used to help design a trial in the Netherlands. This trial design is complicated by the choice between rather different screening tests that each would perhaps call for different screening intervals and screening ages. We are also intending a joint analysis of trials that have already been performed in other countries, giving an opportunity to further develop this method of analysis in another disease than breast cancer.
Recent developments in the possibilities to detect lung cancer and of infections that may cause stomach cancer, may lead to consider new screening programmes. The decision for such a new screening programme can be supported by for instance Miscan models.
Models such as Miscan will probably gain in importance for gaining rational control over complex decision situations that are inevitable where cancer screening is at stake and where the aim is not just to maximise the targeted effect, but to reach the best balance of favourable and unfavourable effect.





last update of this page: 29 July 2005