Weight and worth
Systematic reviews of randomised controlled trials provide the highest level of evidence of efficacy of treatments - though in other circumstances, like adverse events, randomised trials may not always provide the best evidence. But systematic reviews are not always available, and we often have to do the best we can with single studies. How to get a grip on the weight and quality of evidence is dealt with here as well.
Most important, though, is ensuring that the choice of outcome used is appropriate. For migraine, completely pain free at two hours seems now to be the most appropriate outcome, but this is not always available. In examining any trial or review, the watchwords are caveat lector ! It is up to the reader to impose their own and their patients' values.
Help and harm
One other strength of NNTs is that they can be used for adverse as well as beneficial effects, when they become numbers-needed-to-harm (NNH). A little thought is needed here too, because most adverse effects are mild and reversible. Others, often more rare, are more serious. An example is the risks of wound infection versus intra-abdominal abscess after laparoscopic appendectomy ( Bandolier 58 ).
Output from systematic reviews
Bandolier prefers evidence from systematic reviews because we know that we should have most, if not all, of the information available. The way in which results are given in systematic reviews can take various forms, though many now include NNTs. All too often, though, reviewers stick with rather sterile statistical outputs - an odds ratio, relative risk, hazard ratio or effect size. These may show statistical superiority of one treatment over another, or over no treatment, but they are hopeless when we try and relate them to clinical practice.
Too often the statistical problem being addressed is whether a treatment works. That's OK when we are looking at small effects in large populations. But most of medicine is concerned with treatments that are known to work, when the question is different - how well does the treatment work? Bandolier has favoured the number-needed-to-treat [1] as a useful way of looking at results of reviews or trials because it more usefully expresses the therapeutic effort that is needed to get a therapeutic result. Increasingly we have choices of treatments, and the NNTs and NNHs should help us make the choice that is right for an individual patient.
An NNT is easy to calculate on the back of an envelope, and especially, we hope, on the Bandolier NNT worksheet. You don't need a supercomputer, but you do need a pencil and calculator, a few neurones in active mode, and a pinch of salt.
Black bag evidence
An NNT can help us to make decisions between treatment options. If the NNT for treatment A is lower (better) than treatment B, then, other things being equal, choosing A over B makes sense. Here the choice is what to put in the black bag. A would go into the black bag, B would not. The other way to use an NNT is to make choices for an individual patient, perhaps whether to treat or not. The choice here is whether or not to take A out of the black bag and use it.
There are, of course, many nuances to all this. Bandolier recommends the book from David Sackett & colleagues - Evidence-based Medicine: how to practice and teach EBM - as a cheap and worthwhile acquisition for any thinking doctor, nurse, scientist or manager in the NHS [2].
Calculating NNTs
The NNT calculation is given here. An example calculating the NNT for oral sumatriptan from the data given on page 2 of the main issue is done on the worksheet. Methods for calculating NNTs from odds ratios and relative risk reduction were given in Bandolier 36 , but we find that we don't use these much.
The NNT calculation is given below. We need to distinguish between treatments, such as aspirin as an analgesic, and preventative measures, such as aspirin preventing further cardiac problems after myocardial infarction. Using the number outputs from systematic reviews is different depending on which you are looking at. The distinction is between treatment and prophylaxis . For prophylaxis , where fewer events occur in the treated group, the calculation shown will produce negative NNTs. You can use those (the number will be correct), or you can switch the active and control groups around to provide NNTs with a positive sign.
The NNT for prophylaxis is given by the equation 1/(proportion benefiting from control intervention minus the proportion benefiting from experimental intervention), and for treatment by 1/(proportion benefiting from experimental intervention minus the proportion benefiting from control intervention).
NNTs for treatment should be small. We expect large effects in small numbers of people. Because few treatments are 100% effective and because few controls - even placebo or no treatment - are without some effect, NNTs for effective treatments are usually in the range of 2 - 4. Exceptions might be antibiotics. The NNT for Helicobacter pylori eradication with triple or dual therapy, for instance, is 1.2 ( Bandolier 12 ).
NNTs for prophylaxis will be larger, few patients affected in large populations. So the difference between treatment and control will be small, giving large NNTs. For instance, use of aspirin to prevent one death at five weeks after myocardial infarction had an NNT of 40 ( Bandolier 17 ).
Using absolute risk reduction
The absolute risk reduction (ARR) is the difference between the event rate in the experimental group and the event rate in the control group. It is the denominator in the NNT calculation. Many reviews and trials provide this information, so if you have it and convert it into a proportion, then you can get the NNT by dividing 1 by the ARR: NNT = 1/ARR
|
|
Confidence IntervalsThe 95% confidence intervals of the NNT are an indication that 19 times out of 20 the 'true' value will be in the specified range. An NNT with an infinite confidence interval is then but a point estimate; it includes the possibility of no benefit or harm. It may still have clinical importance as a benchmark until further data permits finite confidence intervals, but decisions must take this into account. A method for calculating confidence intervals was given in Bandolier 18 . L'Abbé plotsA paper [3] by Kristen L'Abbé and colleagues written over ten years ago is regarded by Bandolier as one of the most sensible and understandable ever written on systematic reviews. The authors suggest a simple graphical representation of the information from trials. Each point on a L'Abbé scatter plot is one trial in the review. The proportion of patients achieving the outcome with the experimental intervention is plotted against the event rate in controls. Even if a review does not show the data in this way, you can do it yourself if the information is in the review, and that's why it's part of Bandolier 's worksheet. |
|
|
|
For treatment, trials in which the experimental intervention was better than the control will be in the upper left of the plot, between the Y axis and the line of equality. If experimental was no better than control then the point will fall on the line of equality, and if control was better than experimental then the point will be in the lower right of the plot, between the X axis and the line of equality. For prophylaxis this pattern will be reversed. Because prophylaxis reduces the number of bad events - such as death after myocardial infarction by the use of aspirin - we expect a smaller proportion harmed with treatment than with control. So if experimental is better than control the trial results cloud should be between the X axis and the line of equality. These plots give a quick indication of the level of agreement among trials. If the points are in a consistent cloud, that gives some confidence that what we are seeing is a homogeneous effect. But if points are spread all over the graph, and especially if they cross the line of equality, then that should make us concerned about the intervention, or the patients being treated and their condition. This can also be called heterogeneity. The important point about a L'Abbé plot is that it shows all of the extant data on one piece of paper. When combined with numbers in the trial, and a summary measure like NNT, it is a neat way to summarise lots of information. |
|
|
Variation in treatment and controlOne of the things that using systematic reviews in this way teaches you is just how variable are the effects of both treatment and control in randomised trials. It is legitimate to be surprised, but after a short time it seems that this is the norm. The reasons for the variability are probably complex, but much will be just random chance. In many circumstances patients can have wide patterns of response to a treatment, but trial size is often relatively small. Gathering data together in systematic review and meta-analysis gives much more power than the single trial in almost all circumstances, and especially for reviews of treatments. Seeing such variability also teaches caution when faced with a single trial. Size is everythingTake a moment to think about what you want to know about a treatment. You probably want some assurance that it works, but you really want to know how well it works. What do you mean by that? Using NNT terminology, you might want to know that the NNT is within certain limits. Take ibuprofen. The NNT to obtain at least 50% pain relief in patients with moderate to severe pain over 4-6 hours is about 3 for 400 mg ibuprofen compared with placebo. How close do you want the estimate to be? You probably wouldn't be happy with a 95% confidence interval which went from 1 (perfect) to 10 (rotten). Would you be happy with 2 to 4, or happier still with 2.5 to 3.5? The answer should be the last of these, but the narrower the confidence interval (the more correct you want the answer to be), the more patients you need to have studied. A mathematical but practical study [4] says that for the confidence interval to be 2.5 to 3.5 we need 500 patients taking ibuprofen and 500 taking placebo. The confidence interval with a single trial of the standard (in pain) of 40 patients per group is 1 to 10. The lesson is to beware the single trial reflex, changing practice on the basis of a single, small, trial. It's quite likely to be wrong. Random chance is in play, and has quite a big effect in small trials, which explains the scatter sometimes seen in L'Abbé plots, and is yet another reason for choosing evidence from systematic reviews. Worthy or what?Calculating NNTs is relatively straightforward compared with the greater complexity of deciding whether a trial is credible, or worthy only of the dustbin. It is impossible to be dogmatic, as every subject has its own complexities. Here are some suggestions for your personal checklist:
Using the toolboxThese are all tools, not rules. We hope you find them useful to look at evidence that comes across your desk, especially when new treatments are being proposed. We would particularly like feedback on the NNT worksheet to know what you liked or disliked. References:
Bandolier's DIY NNT example sheetA number needed to treat (NNT) is defined by a number of characteristics. This worksheet is designed as an aide memoir for working out NNTs from papers and systematic reviews. First fill in the answers to the questions, where appropriate, graph the data on the L'Abbé plot, and finally do the NNT calculation. |
|
|
|
Now graph the percentages for the trial on the graph from the percentages from F and J. This can be done for different outcomes of a trial, or individual trials in a systematic review or meta-analysis.
|
|
|
|
Now calculate the NNT using the proportions from F and J.
To download a blank version of this worksheet to print and use click here |
|
|
go to Bandolier 59