Orthopaedics is really one of the last frontiers of evidence-based medicine. In this field, there are few high-quality, randomised trials to tell us what works, what harms and what is just a waste of time. In recent years we have seen some procedures put under the arthroscope and examined by the numbers. Notably, the very common arthroscopic knee surgery has been compared to a sham procedure (NEJM, 2002) and… not a winner. In the last few years knee arthroscopy has been examined in a systematic review (BMJ 2015)… not any clear benefit over small harms…. so there we are – time to rethink our approach to knee surgery.
So today I am reviewing another new RCT that looks at arthroscopic subacromial decompression surgery for the shoulder. The CSAW [Can Shoulder Arthroscopy Work?] trial was published in the Lancet in November 2017. There was also an accompanying editorial commentary in the same issue. If you read the headlines and the commentary on this paper then the conclusion seems to be pretty clear – there is NO demonstrable short-term benefit to subacromial decompression when compared to placebo surgery or no intervention. So… is shoulder scoping just as useful as knee scopes? Well in order to answer that question we need to look closely at the trial and see what it really can tell us. Here we go.
The final trial included 313 middle-aged patients recruited from 32 hospitals in the NHS system of the United Kingdom. Patients had to have:
- subacromial pain of at least 3 months’ duration
- with intact rotator cuff tendons, patients with partial tears were included
- previously completed a non-operative management programme that included both exercise therapy and at least one steroid injection.
- The diagnosis was confirmed by a consultant shoulder surgeon
I think this is a reasonable group to examine and reflects the people we see in primary care. In the Australian context, 3 months would be a relatively short period of time to wait before considering surgery. However, most patients in this trial then waited a further few months before undergoing the intervention.
The baseline stats were equal across the groups.. nothing to see here out of the ordinary.
The patients were randomised into 3 groups 1:1:1 including 2 surgical groups.
The first operative group underwent arthroscopic subacromial decompression under GA. This included all the usual Orthopaedic scraping and debriding of bits of anatomy that most of us long forgot in Medical School. This surgery required 2 arthroscopic ports.
The second group served as a “sham” surgery group. These patients underwent an investigational arthroscope under GA. A camera port was inserted to look and record the pathology, the joint was irrigated and a second sham port incision was made to keep the blinding intact.
Importantly both of the “surgical groups” then received between 1 and 4 postoperative physiotherapy sessions. This may, therefore, be seen as part of the intervention. It was not purely a trial of surgery, but of Physio too. Would it have been reasonably easy to get some Physio sessions for the control group too? This may have created a more precise intervention control setup?
The third control group underwent no treatment – they received no prescribed physiotherapy, steroid injections or other active therapy. They attended a follow-up appointment with a shoulder surgeon at 3-months post randomisation.
The primary outcome was the Oxford Shoulder Score – a 48 point questionnaire looking at shoulder pain and functional outcomes. This was recorded at 6 months after trial entry and was repeated at 12 months as a secondary outcome. They also recorded a series of other pain and functional alphabet scores – 6 of them! [a set-up for p-hacking?] They also recorded any adverse outcomes. Now, it is hard to say exactly what a clinically-significant Oxford score shift would look like. It is a patient-oriented outcome as it includes measures of pain and function. I am a litlte wary of these patient questionnaire outcomes – when you think about it they are actually a composite of a dozen individual outcomes and that can be hard to interpret – however, it is what we tend to measure in such trials. Maybe it would be better to ask patients a simple binary question eg. : “Are you able to do what you need to do?”
POWER and other NERDY STATS
Using a two-sided t-test, 90% power to detect a difference in the Oxford Shoulder Score of 4·5 (SD 9·0), with a 5% level of significance required a sample size of 85 participants in each group. They aimed to recruit 100 patients per group anticipating 15% drop out rate….
… and that is pretty much what happened! The groups included 106, 103 and 104 at randomisation. At 6 months only 90, 94 and 90 had data for analysis. But, there is always a but…
- only 76 of those randomised to active surgery had received it.
- only 60 in the “sham surgery” group had received the stated intervention
- and of the 104 in the control group 11 ended up with some form of surgical decompression at 6 months….
So if you are following the numbers here – we are slipping well below power and the control group is not exactly a true control group. So anything we say after this point requires a pinch of salt thrown over the [sore] shoulder.
Summary – there was not much difference between the 2 surgical groups. If anything, the “scope only” / sham surgery group were slightly better off at the six-month mark! When compared to the control group there was a small but insignificant improvement in the Oxford Shoulder Scores.
Probably the most useful and scientifically robust data to take away from all this is that the patients tended to get better no matter what the doctors did!! There was an average 12-point improvement across the board which was much larger than differences observed between groups.
You could argue that the gap between the green line of “no treatment” and the “surgical” groups is opening up at 12 months… but recall that this trial was not powered for this as a primary outcome and that the drop out rate at 12 months was increasing.
Also at the 12-month mark, the “no treatment group” included 24 patients who had some surgery.
On the downside of the data – there were few adverse outcomes. there were 2 “frozen shoulders” in each group… small numbers, meh!
My TAKEHOME POINTS
It is great that we are seeing more randomised evidence in Orthopaedics, especially for the bread ‘n butter clinical problems like knee and shoulder pain. This trial has a reasonable design and the use of the sham surgery gives us a chance to detect important differences between placebo and actual surgery.
Unfortunately, the CSAW trial suffered as a result of the non-adherence and delays to protocols. The headlines state that there was no difference between groups, however, this study lacked the statistical power to make any hard claims about the differences.
So, what can we say with this data?
Option 1: Agree with the headlines and commentary. There is no clinically significant difference between the three groups. Shoulder decompression is not worth it.. use a tincture of time, maybe some physiotherapy and the patient will improve either way.
Option 2: Disagree. Consider that this is flawed data and that it is giving arthroscopic shoulder decompression an unfair trial. I frequently lambast trials which claim a benefit for various drugs based on flawed data or analysis. So to be fair, on the flipside, we cannot draw conclusions about non-superiority from this small, flawed data set.
There are issues around the trial’s internal validity – high drop out and contamination of groups. If anything this would tend to dilute the potential effects of surgery. So that means we need to draw breath before dismissing the null hypothesis here.
If we accept that the data is weak, then we are left to Bayesian pondering. In order to integrate this new data into our worldview (aka current practice) we need to know the prior likelihood that the surgery was beneficial… and this is where we come unstuck. There just isn’t much quality data out there to form the prior. Hence it comes down to where we think the onus of proof lies. Do our Orthopaedic colleagues need to prove benefit OR is this surgery an established “standard of care” that needs to be disproved before we change tack? ‘Tis a tricky ethical question.
Option 3: Pragmatism. The CSAW trial population had on average less than 6 months of pain/dysfunction and received 2 injections prior to entering the trial. Maybe if the trial population had more severe or longer duration of symptoms then a larger benefit may have been seen. When I discussed this with my local team they all agreed that they would not usually refer a patient for surgery after only 3 or 6 months. They would usually opt for a prolonged period of physical therapy and injections or another analgesia options. A Facebook poll on the GPDU doctors averaged out at around 9 months before referral to a surgeon… of course, it depends on your system as to how long the wait might take to actually see a surgeon and then receive intervention!
If we opted to offer surgery to only those with prolonged, refractory or severe symptoms I imagine we would be more likely to see a benefit and would certainly save a lot of resources. As is the case with many conditions – the symptoms will decrease over time for many patients, so selecting only those with significant and persistent symptoms might result in more bang for the arthroscopic buck.
Thanks to Dr Michael Tam ( evidencebasedmedicine blog ) for providing pre-publication peer review and insights around the nitty gritty of stats in this article.