Introduction

Making decisions to introduce or change cancer screening that are based on observed evidence is not straightforward. Even if there is evidence available from randomised controlled trials that clearly proves a reduction in mortality from the cancer in question, there are still several issues to be resolved before an evidence based decision can be made on the introduction or change of a screening programme. Screening inevitably comes with unfavourable effects such as extra cancer incidence with the burden of treatment of those cancers and its after-effects, and the burden from an increase of diagnostic procedures and of the screening procedure itself. This raises the question whether favourable effect of screening (mortality reduction, prevention of advanced disease) are sufficiently large in comparison to the unfavourable effects. Other questions are: How costly is screening? How are the observed effects to be extrapolated to a situation that is different with respect to demography, epidemiology and health care? There are also several optimisation questions. What is the optimal screening test? What is the optimal way of organising the screening that is to be offered to the public? What is the optimal screening schedule with respect to ages and intervals between screenings?
Each of these questions concerns a trade-off of favourable and unfavourable effects of screening that may turn out differently for the screening programme that is considered than for the situation in which empirical evidence was observed. Thus there are several steps to be made in between the empirical investigation that produces observed evidence concerning cancer screening and the decision to introduce or change cancer screening. This thesis concerns neither empirical investigation nor decision-making, but the intermediate steps between the two. The steps that are being addressed are the gathering of evidence, the uncertainty associated with present evidence, the balancing of favourable and unfavourable effects and the influence of particular circumstances in which cancer screening takes place. Finally there is the possible need for side steps to go into compelling questions that arise along the way. The examples chosen to illustrate these steps are part of the work of the research group on screening evaluation of the department of Public Health of Erasmus University Rotterdam.(Beemsterboer 1999; de Koning 1993; Koopmanschap 1994; van Ballegooijen 1998b; van Oortmarssen 1995; Wildhagen 1999)
Most examples rely to a large extent on the application of Miscan simulation models. (Habbema et al. 1985; Loeve et al. 1999). The Miscan program simulates a series of individual life histories considering date of birth and of death, development of the disease in question before diagnosis in a situation without screening, diagnosis and survival, and the influence of a screening programme on the date and stage of detection and the date of death. A life history is simulated through individual realisations of specified probability distributions concerning demography, the course through a number of possible disease states, sojourn times in these disease states, behaviour with respect to the screening programme, and test characteristics and consequences of screening. The life histories are aggregated to a population in which incidence, life years with disease, life year lost from disease, numbers of screening, stage distribution at diagnosis etc. are counted. A Miscan model can reproduce the circumstances under which empirical evidence has been gathered, can compare observed and modelled results so that one can check to what extent the model assumptions on natural history can explain what is observed. Likewise, a Miscan model can reproduce the circumstance of a future screening programme for which the results can be predicted and thus support decision making.

gathering evidence
Observed evidence on medical interventions is usually generated in circumstances that are rather different from future daily applications of that intervention. That also means that, in order to predict effects under those different circumstances, there is need for evidence from different types of studies.
Randomised trials in cancer screening aim at testing whether screening results in a reduction in mortality from the cancer. In doing so, these trials also produce an estimate of the size of the mortality reduction. But since the size of a trial is usually chosen to be just sufficient to produce a mortality reduction that is just significantly different from 0, the confidence interval for an estimate of the size of the mortality reduction is wide. A more precise estimate can be derived by combining the results of more than one trial. For this the concept of meta-analysis has been developed. However, a meta-analysis gives a more precise estimate at the expense of the intervention being defined less precisely.(Blettner et al. 1999; Davey Smith and Egger 1998; Davey Smith et al. 1997) In an attempt to resolve this problem, a model can be used that is structured to explain the effects of screening. The direct estimate of mortality reduction is substituted by an estimate of crucial parameters that explain the amount of mortality reduction in a screened population. The model can apply the same assumptions on natural history and effects of early detection by screening to the settings of the different trials and thus regain precision of the definition of the intervention. An example of such parameter estimation is presented and discussed in chapter 2 of this thesis.
Randomised trials are sometimes practically not feasible and sometimes they are just not the most appropriate method for acquiring knowledge. But observational studies are prone to bias. In real life it is usually not possible to investigate bias in observational studies. Or rather, as far as bias can be determined in observational studies, that bias can also be eliminated. Therefore the remaining bias cannot be investigated in real studies. We applied micro-simulation models that were originally developed for evaluation of screening programmes to investigate sources of bias in observational studies.
Case-control studies are used to estimate mortality reduction due to cancer screening. The general design of case-control studies is usually adjusted for estimating efficacy of cancer screening. Within each set of a case who dies from the cancer and its matched controls, exposure to screening among controls that occurs later than the diagnosis with the disease in the case is disregarded.(Cronin et al. 1998; Sasco et al. 1986) This is to compensate for bias due to the fact that after diagnosis one is not screened any more. Chapter 3 shows that this is overcompensation and still leads to bias. It also shows that there are several other serious biases possible if the particulars of the timing of screening in the population under study are not carefully considered. These biases occur next to bias due to the association of risk for the disease and the individual tendency to participate in screening.
Besides case-control studies, that can be seen as weak alternatives for randomised controlled trials, there are several other types of estimates that can be of value for evaluating cancer screening. An important mediator of cancer screening effects is net survival from the disease. Net survival shows the mortality effect among individuals with the disease that is attributable to that disease.(Estve et al. 1994) Estimates of net survival can be biased in many ways. Chapter 4 investigates an alternative for the standard methods of net survival estimation. This retrospective survival selects the population that is used for the estimate from the people who have died in a certain, relatively small, period. In contrast, usually survival estimates select the population from newly diagnosed cases of the disease. Retrospective survival is shown to possibly result in large bias. In chapter 5 it is shown for the colorectal cancer and prostate cancer cases in the SEER program that different standard methods of survival estimation, do not result in very different outcomes.

evidence and uncertainty
As mentioned earlier, empirical evidence on essential aspects of cancer screening, such as its quantitative influence on mortality from the disease, tends to be not very precise. Besides the question of how to establish more precision, there is also reason for explicit concern about the propagation of uncertainty on different aspects of cancer screening into conclusions for decision support. The most vigorous method to describe this propagation is an uncertainty analysis.(Cox and Baybutt 1981; Morgan and Henrion 1990) For this type of analysis it is assumed that the uncertainty about the true model parameters is represented by a probability distribution that leads to a probability distribution in model outcomes, which in turn is assumed to represent uncertainty in the model outcomes. Such analysis is not part of this thesis.(Chessa et al. submitted) But in several chapters sensitivity analyses are applied in which model scenarios with different values for uncertain model assumptions are evaluated in order to study the possible effects of these values being overestimated or underestimated. Chapter 6 shows a sensitivity analysis concerning two questions. The first is about the explanation for the different apparent performance of the breast cancer screening programmes in North West England and the Netherlands. The second involves the robustness of the conclusions that two modifications of the U.K. breast screening programme, extending the age range of the U.K. from upper age 64 to 69 and shortening the screening interval from 3 to 2 years, are roughly equally cost-effective and that the cost per life year gained of both extensions of the programme is not much higher than of the ongoing programme. Chapter 7 compares the usual method of cervical cancer screening by pap smears with unaided visual examination. The latter is a cheaper screening test that requires less technological input and it is therefore considered as a more feasible alternative for pap smears in developing countries. We show under what model assumptions unaided visual inspection of the cervix is more cost-effective than screening by pap-smears to prevent mortality from cervical cancer.

balancing favourable and unfavourable effects
Cancer screening inevitably leads to unfavourable effects among those being screened. Undergoing screening as such is often an uncomfortable process, undergoing diagnostics for cancer after a positive screening result is also unpleasant and causes grave anxiety in many of those who are affected. There is also the burden from diagnosing cancers earlier or diagnosing cancers that would not have been diagnosed resulting in extra life years with cancer and to more unnecessary cancer therapy. In types of screening that cannot prevent cancer incidence, such as breast cancer screening and prostate cancer screening, the more frequently occurring unfavourable effects from screening and extra diagnoses are on balance small in comparison with the, though less frequent, stronger health effects from extra incidence and earlier detection of cancer. There are situations where cancer screening may significantly reduce mortality from the disease, but where unfavourable effects are so substantial that screening is not prudent. An example of such a situation is screening for breast cancer at older ages. Even when assuming that the mortality reduction due to screening remains as high at older ages as in the randomised trials, then still the number of life years that can be gained decreases from a certain age. Moreover the probability of finding a breast cancer that would not have been diagnosed before the woman would die from other causes, increases steeply at higher age. Chapter 8 presents estimates of this change in balance with increasing age. Chapter 9 shows that the amount of unfavourable effects relative to the favourable effects (the major reason for an upper age limit for a screening programme) will not be diminished by a longer screening interval.

effectiveness and circumstances
Chapter 10 gives a comprehensive overview of different aspects that are to be taken into account for the evaluation of cancer screening. Most examples are from breast cancer screening. Besides this general framework for screening evaluation, there will still arise compelling questions that do not fit well into this framework. Two examples of such questions are given in chapters 11 and 12. Chapter 11 analyses the question that came up in Denmark where in the discussion about implementing breast cancer screening there was concern for the apparent higher number of cancers detected in the group of women invited for screening in the Malm trial in comparison to the control group, which might be interpreted as a sign of overdiagnoses. Chapter 12 shows the striking similarity between observed stage distribution at first and repeat breast cancer screenings and discusses difference explanations for this unexpected similarity, since under plausible assumptions one would expect a substantially more favourable stage distribution at repeat screenings.

back to dissertation contents




last update of this page: 29 July 2005