Ovarian cancer is a horrible disease. It is often asymptomatic until late in its course and then causes a lot of suffering. This is a disease with an extraordinarily high morbidity and currently has a 5-year survival of only 40%. So wouldn’t it be excellent if we could find this disease early in it’s course, intervene and cure it?
This is the largest ever trial looking at ovarian cancer screening – with more than 200,000 participants randomised and exceptionally well followed and documented. This trial was roughly twice the size of the Ovarian component of the American PLCO trial published in JAMA in June 2011 . The American PLCO trial failed to find any benefit from screening for Ovarian Cancer and did find a significant burden as a result of over diagnosis, false-positivity and subsequent downstream complications.
So many people in the medical community have been awaiting the outcomes of the UKCTOCS trial – and on the 17th of December it landed. So – what was the outcome? Did they find a benefit? Will we be changing our practice based on this new data? Well this is where it gets interesting. The UKCTOCS trial has been publicised widely as “positive” result – i.e. a significant benefit for screening. However, when you read the paper things are really not so clear. In fact – they are quite unclear… muddy. So I am going to do a bit of a deep dive into the paper and try to make sense of it all. But first lets look at the coverage in the mainstream and medical media. You need to know what your patients are hearing and go beyond the sound bites in the medical media to get this one right.
In Australia the two biggest medical News rags reported the trial very differently:
“SCREENING based on an annual blood test may help reduce the number of women dying from ovarian cancer by around 20%.”
So if you are the sort of doctor that reads the headlines and skims the abstracts then you are probably getting some mixed messages. Unfortunately both of these articles didn’t delve into the statistics underlying the claims made in the discussion by the authors.
In the USA the mainstream media also reported this important trial with similar headlines.
CBS News was very positive, the headline was: “Blood test for ovarian cancer saves lives, study finds” They interviewed a few Medics and if you read the article the benefits are clarified and it is suggested that we need to wait 3 more years to check if it really works. But the expert interviewed, Dr Agus, is quoted as saying:”If this [screening] were implemented in the United States we would save about 3,500 lives per year,” Interesting …. that is the sort of statement that tends to have patients knocking down our doors to get this new ‘super test’.
The New York Times was more circumspect. They interviewed Dr Menon – one of the lead authors. Their headline was: “Early Detection of Ovarian Cancer May Become Possible” They published a series of bites from a range of expert medical professionals which ranged from positive, through to skeptical and did include a number of the key statistical points. The NYTimes also made note of the author’s potential conflict-of-interest. This was probably the most rigorous coverage of the trial I have read.
So you need to know what the UKCTOCS trial did and did not find. Your patients will be asking you for advice and may request the “test”. So lets break it down and look at the study. Lets do a super PICO analysis.
POPULATION: This was a huge trial! 202,638 women were randomised. They were between 50 and 74 years of age at randomisation. They were largely postmenopausal. The study was completed in the NHS Trusts from all over the UK and Northern Ireland. More than 96% of the participants were “White” – so a very Anglo population. As you might expect from such a large cohort – the baseline stats were all very evenly matched across the groups and a good representation of the sort of patients that we all see and treat. Variables like age, menstrual history, HRT use, parity and co-morbid cancers were what you would expect.
More than 1.2 million women were invited to screening from the massive NHS databases – the idea here was to minimise the effect of the “healthy volunteer effect”[HVE] which can skew results in screening trials – i.e. make it less likely to find as many cancers if all of your volunteers are clean living, non-smoking, health conscious folk. The authors of the UKCTOCS also published an analysis of the “healthy volunteer effect” in 2011 in Trials. However they concluded that their invitation strategy did not reduce the HVE – this kinda makes sense when you consider that only ~ 1 in 6 women accepted the invitation to participate. I imagine they would represent the health conscious upper 1/6th of the population! The HVE meant that the women who participated in the trial died at a particularly low rate – overall only 37% of the predicted mortality. So the external validity of these results are difficult to apply to the population as a whole.
Importantly it should be noted that women at increased risk [defined as > 10% lifetime risk] of ovarian cancer due to family history (Exclusion criteria
Eligibility was determined as follows: Participants were known carriers of one of the OC predisposing genes (BRCA1, BRCA2, MLH1, MSH2, MSH6, PMS1, PMS2) or first-degree relatives (mother, sister, daughter) of an affected member of a high-risk family. High-risk families were those fulfilling any of the following criteria: The family contained two or more individuals with ovarian cancer (OC) who were first-degree relatives The family contained one individual with OC and one individual with breast cancer diagnosed at age < 50 years who were first-degree relatives The family contained one individual with OC and two individuals with breast cancer diagnosed at age < 60 years who were connected by first-degree relationships The family contained an affected individual with a mutation of one of the known OC predisposing genes (BRCA1, BRCA2, MLH1, MSH2, MSH6, PMS1, PMS2) The family contained three individuals with colorectal cancer, at least one of whom was diagnosed at age < 50 years as well as one individual with OC, and all of these individuals were connected by first-degree relationships The first three criteria could be modified where paternal transmission occurred (ie, families in which affected relatives were related by second degree through an unaffected intervening male relative and in which the proband had an affected sister were eligible) In addition, when women did not fall within these inclusion criteria, but the recruiting center felt that they had a lifetime risk of OC of ≥ 10%, the study clinical geneticist (J.M.) reviewed the pedigree and documentation of diagnoses to determine eligibility.
) were excluded from the trial. They were invited to join the UKFOCSS trial run by the same group. See the end of this post for a discussion of that trial.
INTERVENTION: the women were randomised 1:1:2 into 3 groups. There were 2 separate intervention groups each consisting of more than 50,000 women. Obviously the women and clinicians could not be blinded to the intervention, however the outcomes team were masked when analysing the clinical outcomes.
The original protocol was for 6 annual screens and 7 years of follow-up.
The compliance with the screening protocol was about 80% which is roughly the same as similar large population-based screening trials.
However, as noted above, the healthy-volunteer effect messed with the calculations. The mortality rate of the trial participants was much lower than that expected in the general population. The women in the trial died at 37% of the rate anticipated at the outset and by the end of the 7 years only at about half the mortality rate expected. This is an issue. Less deaths means that there was less potential to detect a real difference in the groups – the efficacy of the screening would be watered down by the relative good health of the participants. However the healthy volunteer effect also means that there are likley fewer deaths from other causes – so may bias the data to make screening look better than it is… but we don’t know for sure. See discussion of “all-cause mortality below”.
The research team decided to extend the screening period by 3 years in order to improve the ability to detect a difference. In plain speak – they increased the ‘dosage of screening’ to find try and detect a benefit. Subsequently follow-up was also extended… median follow up was about 11 years at the final curtain. The extension resulted in a recalculation of power to detect a 30% difference from 90 to 80% power.
So here is how the screening tests went:
Ultrasound screening: women underwent annual trans-vaginal ultrasound and were followed up as follows:
Normals – ongoing annual screening scans
Unsatisfactory – had a repeat scan done in 3 months
Abnormal scan – these women had a repeat scan with a senior sonographer within 6 weeks.
Multimodal screening (MMS)
This screening strategy was a bit more complicated. It is based upon annual CA-125 blood levels. Women were tested and their results fed into an algorithm called ROCA [Risk of Ovarian Cancer Algorithm] to detect rises in CA-125 over baseline, rather than using a standardised lab cut-off.
The ROCA algorithm triaged the women into three groups:
Normal: continued annual screening
Intermediate: repeat CA-125 in 3 months
Elevated: these got a repeat CA-125 and a TV ultrasound within 6 weeks
Important note: Prof. Jacobs, the lead author declares in the COI statement that he holds the patent for the ROCA algorithm and has financial interest in the company Abcodia which sells this test.
Women with persistently abnormal results – either on US or the ROCA/US groups were subsequently investigated clinically by a trial investigator. They received whatever investigations or surgeries that the specialists thought were required.
There were a few papers written by the UKCTOCS team in 2011 analysing the relative sensitivity and specificity of each of the screening modalities. Both were around the 80 – 85% mark. Of note screening detected a little over half (59%) of the tumours counted in the total of ovarian cancer deaths. So at best screening will find a bit over half of the nasty tumours.
CONTROL: About 100,000 received No Screening (although there was some contamination of the control group – as would be expected in such a large study.) 4.3% of women underwent some sort of screening in the control group based upon a questionnaire at the end of the trial.
OUTCOMES: The primary outcome used in this trial is ovarian cancer death by Dec 31, 2014. Below is the key table of results form the paper:
Note – this is a disease-specific mortality outcome. There was no mention of “all cause mortality”. The trial was powered to detect a 30% mortality benefit for screening. There is no data provided in the paper about “all-cause mortality”. This is a bit odd as the best measure of the effect of a screening intervention would be “all-cause mortality”. The use of disease-specific mortality is useful to tell us if the screening actually does pick up cancers earlier and prevent death from ovarian malignancy, however…. we, and our patients, want to know if it will make them live longer. Only all-cause data can tell us this. If screening means that we diagnose ovarian cancer early and then we increase other mortality eg. more PEs or surgical deaths then we are not doing the right thing by our patients.
I did email the authors and ask if there were any numbers on “all-cause” mortality. The response did not throw any light on this stat. I do find it a little unusual that in such a large, well-conducted trial with great follow-up that this data was not published as part of the trial. Even if it were included as a secondary outcome – we could at least look and get a feel of the overall benefits. So I remain a bit confused as to why it didn’t get included.
Ovarian cancer was defined as: “malignant neoplasms of the ovary, which include primary non-epithelial ovarian cancer, borderline epithelial ovarian cancer, and invasive epithelial ovarian cancer; malignant neoplasms of the fallopian tube; and undesignated malignancies of the ovaries, fallopian tube, or peritoneum.”
Specifically primary peritoneal was not a primary outcome – although the WHO reclassification of cancers in 2014 threw a bit of a spanner in the works. The analysis therefore includes a “secondary analysis” which includes both primary ovarian cancer AND primary peritoneal cancer in the mortality numbers. Hmmm… not sure about his one! Beware the analysis of secondary outcomes
The analysis also broke the mortality reduction numbers into two time periods – 0 – 7 years AND 7 – 14 years. This is an interesting way to crunch some numbers. As with any long term mortality study – it is a basic fact that more people die the longer that you follow them. Hence there is more likely to be a benefit shown later in follow up. So if you are trying to break through he magical p < 0.05 line – then this is one way to do it.
Now if you scan the table above you will notice a few things:
There are two different statistical techniques used to analyse the data – the Cox model and the Royston-Parmar model. These are both accepted ways of looking at data such as the survival data in this trial. However if you go back and look at the UKCTOCS trial protocol (available here at IWH website ) you will read the planned analysis was “a Cox regression model will be used to model the difference in mortality rate between the control arm and each individual screen arm.” So the Royston-Parmar model was not originally planned. Is this a problem? Well suppose we used 20 different models to analyse the mortality curves and then only published the one or two that showed a benefit. This is why we have a trials register – to ensure transparency of trial design and analysis. Note: Prof. Parmar was the head statistician on the UKCTOCS trial – so understandable that his method was used as an analysis tool.
None of the Cox models reached statistical significance – they all included “0” in the confidence interval. However the R-P model did just squeak under the P-value of 0.05 for a few of the stats – namely those where “prevalent cases were excluded” which brings me to point #3.
“Prevalent cases” were excluded from the analysis. Makes sense – we should not include women who already have ovarian cancer at the outset. But… hang on a minute. How did they know that these women had a cancer before they started screening? Well it is hard to answer that question. It appears that they looked at the CA125 trend in women who were diagnosed with ovarian cancer and extrapolated backwards in time to decide which women likely had a tumour at day 1. Hmmm… so how did the women in the “no screening” group get the same treatment if they never had a CA125? They did a post hoc assay on stored serum samples from enrolment samples and decided who probably already had a ‘prevalent cancer’ based on the CA125 level. I do not understand how we could generalise this to an external population of real world women. We can never know who already has a tumour in day-to-day GP practice ( if the women are asymptomatic) – so excluding them from the analysis seems to reduce the external validity of this trial.
The analysis was extended to include both Ovarian and “primary peritoneal cancer” as a composite secondary. This also makes sense – primary peritoneal cancer is likely ovarian in origin as per the WHO reclassification. However if you look at the raw numbers you will see that the inclusion of this secondary outcome does favour the MMS strategy – there were 16 peritoneal cancer deaths in the MMS group and 15 in the control group. Recall that the control group was twice the size of the MMS group. So although it is a reasonable thing to analyse we need to beware of secondary outcomes and composite outcomes as they will be prone to bias.
Of the screening groups 1634 women [50 per 10,000 screens] had unnecessary surgery i.e. surgery that yielded benign results. The rate of “false positive” surgery was described by the ratio of benign : malignant pathology. The ratios by group were : No-screening = 1 : 1.2, MMS 2.7 and USS 1: 6.4 . The surgical complication rate is quoted as 3.5%. Unfortunately the actual harms of the surgeries were not documented as far as I can tell from the paper. They are discussed in the 2-hour video produced at the trial launch – but bizarrely not in the actual paper. So it is hard to say what the actual true “harms of screening” are in this study. Below is the table you can find in the supplementary material which describes the types and numbers of harm events in the screening groups. For the record – in the PLCO trial the surgical complication rate in screened women was ~15% – so 5 times higher than the UK group… must be better surgeons in the NHS?
OTHER INTERESTING STUFF
There was a retrospective review of the American PLCO population data where they went back and applied a “best guess” version of the ROCA algorithm to the American cohort. Titled: Potential effect of the risk of ovarian cancer algorithm (ROCA) on the mortality outcome of the Prostate, Lung, Colorectal and Ovarian (PLCO) trialfrom Int. Journ Cancer, 2012. In this exercise the authors wondered what if if they had used the ROCA algorithm instead of the absolute CA125 cut-off value as a screening tool? Of course this has to be taken with a grain of salt – but they concluded that ROCA would not have shown any additional survival benefit in the PLCO cohort. One of the authors, Dr Skates, is also a co-inventor of the ROCA algorithm and co-author of the UKCTOCS trial.
Having to have subsequent testing after an initial positive screening test did increase anxiety
Undergoing surgery increased anxirty
Being diagnosed with ovarian cancer had a large effect on anxiety – as one might expect.
The same research group also offered screening to “high risk” patients in a separate trial mentioned above – UK FOCSS. This was a prospective observational study (no control group) looking at the performance of annual screening using CA125 and TV ultrasound in 3500 high risk women. The screening was found to be 80% sensitive. And in this group there was not a significant “stage shift” in cancers detected. That is – screening did not move women from an advanced stage of disease at diagnosis to earlier, more treatable disease. There were a few reasons discussed for this finding and the trial is now in Phase II – in which women are screened at 4-monthly intervals and faster follow-up surgery etc are planned. So in summary – even in a high risk cohort we have not yet seen a benefit from annual screening in terms of finding earlier disease. That is the core goal of any screening program – to find earlier and treatable disease.
I am “just a GP”. I am not a guru in biostats so I may be completely wrong here. However, I am just a GP who wants to know what to tell my patients when it comes to screening for a nasty disease. I would love to be able to do something to prevent my patients from getting diagnosed with late-stage ovarian cancer. Here are my summative thoughts:
The UKCTOCS study was large and well conducted. The results, in my reading, do not show a significant benefit to screening.
There are a number of statistical and methodological quirks that do raise questions about the reliability of the results and their external validity
The conclusions of the authors are optimistic – yet at the basic scientific level – we cannot reject the null hypothesis based on these numbers.
In time we may get more follow-up data which may change the situation. However, as of December 2015 I do not think we should be changing our practice.
Screening for ovarian cancer remains unproven and the harms remain largely unknown.
I would really likely see the all-cause mortality data presented in a clear manner so that we can all look at it and make our own conclusions.
Based upon my reading of this paper and the surrounding data I do not think I will be recommending screening with any tool for ovarian cancer to my patients.
I am somewhat concerned that this paper and the media hype around it may represent the “edge of the wedge” – a foot in the door for screening which remains unproven. We have all seen and struggled through the perils of prostate cancer and mammography over-diagnosis. At best guess the “number need to screen” is somewhere between 2000 and infinity. This is a very weak effect if any at all. We should be investing our time, money and patient’s goodwill in other health pursuits for now.
I am a GP working in Broome, NW of Western Australia. I work as a hospital DMO (District Med Officer) doing Emergency, Anaesthestics, some Obstetrics and a lot of miscellaneous primary care. Also on the web as @broomedocs | + Casey Parker | Contact