Skip navigation

Number needed to treat (NNT)


Why can't we produce wallpaper with NNTs and ways to calculate them? Not a bad idea, so this supplement reprises how to calculate NNTs ( Bandolier 36 ), and provides a simple-person's guide to DIY NNTs. We have produced a single sheet which allows anyone with pencil and calculator to do the job themselves. An example is at The DIY NNT example sheet .


Weight and worth


Systematic reviews of randomised controlled trials provide the highest level of evidence of efficacy of treatments - though in other circumstances, like adverse events, randomised trials may not always provide the best evidence. But systematic reviews are not always available, and we often have to do the best we can with single studies. How to get a grip on the weight and quality of evidence is dealt with here as well.

Most important, though, is ensuring that the choice of outcome used is appropriate. For migraine, completely pain free at two hours seems now to be the most appropriate outcome, but this is not always available. In examining any trial or review, the watchwords are caveat lector ! It is up to the reader to impose their own and their patients' values.


Help and harm


One other strength of NNTs is that they can be used for adverse as well as beneficial effects, when they become numbers-needed-to-harm (NNH). A little thought is needed here too, because most adverse effects are mild and reversible. Others, often more rare, are more serious. An example is the risks of wound infection versus intra-abdominal abscess after laparoscopic appendectomy ( Bandolier 58 ).


Output from systematic reviews


Bandolier prefers evidence from systematic reviews because we know that we should have most, if not all, of the information available. The way in which results are given in systematic reviews can take various forms, though many now include NNTs. All too often, though, reviewers stick with rather sterile statistical outputs - an odds ratio, relative risk, hazard ratio or effect size. These may show statistical superiority of one treatment over another, or over no treatment, but they are hopeless when we try and relate them to clinical practice.

Too often the statistical problem being addressed is whether a treatment works. That's OK when we are looking at small effects in large populations. But most of medicine is concerned with treatments that are known to work, when the question is different - how well does the treatment work? Bandolier has favoured the number-needed-to-treat [1] as a useful way of looking at results of reviews or trials because it more usefully expresses the therapeutic effort that is needed to get a therapeutic result. Increasingly we have choices of treatments, and the NNTs and NNHs should help us make the choice that is right for an individual patient.

An NNT is easy to calculate on the back of an envelope, and especially, we hope, on the Bandolier NNT worksheet. You don't need a supercomputer, but you do need a pencil and calculator, a few neurones in active mode, and a pinch of salt.


Black bag evidence


An NNT can help us to make decisions between treatment options. If the NNT for treatment A is lower (better) than treatment B, then, other things being equal, choosing A over B makes sense. Here the choice is what to put in the black bag. A would go into the black bag, B would not. The other way to use an NNT is to make choices for an individual patient, perhaps whether to treat or not. The choice here is whether or not to take A out of the black bag and use it.

There are, of course, many nuances to all this. Bandolier recommends the book from David Sackett & colleagues - Evidence-based Medicine: how to practice and teach EBM - as a cheap and worthwhile acquisition for any thinking doctor, nurse, scientist or manager in the NHS [2].


Calculating NNTs


The NNT calculation is given here. An example calculating the NNT for oral sumatriptan from the data given on page 2 of the main issue is done on the worksheet. Methods for calculating NNTs from odds ratios and relative risk reduction were given in Bandolier 36 , but we find that we don't use these much.



The NNT calculation is given below. We need to distinguish between treatments, such as aspirin as an analgesic, and preventative measures, such as aspirin preventing further cardiac problems after myocardial infarction. Using the number outputs from systematic reviews is different depending on which you are looking at. The distinction is between treatment and prophylaxis . For prophylaxis , where fewer events occur in the treated group, the calculation shown will produce negative NNTs. You can use those (the number will be correct), or you can switch the active and control groups around to provide NNTs with a positive sign.

The NNT for prophylaxis is given by the equation 1/(proportion benefiting from control intervention minus the proportion benefiting from experimental intervention), and for treatment by 1/(proportion benefiting from experimental intervention minus the proportion benefiting from control intervention).

NNTs for treatment should be small. We expect large effects in small numbers of people. Because few treatments are 100% effective and because few controls - even placebo or no treatment - are without some effect, NNTs for effective treatments are usually in the range of 2 - 4. Exceptions might be antibiotics. The NNT for Helicobacter pylori eradication with triple or dual therapy, for instance, is 1.2 ( Bandolier 12 ).

NNTs for prophylaxis will be larger, few patients affected in large populations. So the difference between treatment and control will be small, giving large NNTs. For instance, use of aspirin to prevent one death at five weeks after myocardial infarction had an NNT of 40 ( Bandolier 17 ).


Using absolute risk reduction


The absolute risk reduction (ARR) is the difference between the event rate in the experimental group and the event rate in the control group. It is the denominator in the NNT calculation. Many reviews and trials provide this information, so if you have it and convert it into a proportion, then you can get the NNT by dividing 1 by the ARR: NNT = 1/ARR


Confidence Intervals


The 95% confidence intervals of the NNT are an indication that 19 times out of 20 the 'true' value will be in the specified range. An NNT with an infinite confidence interval is then but a point estimate; it includes the possibility of no benefit or harm. It may still have clinical importance as a benchmark until further data permits finite confidence intervals, but decisions must take this into account. A method for calculating confidence intervals was given in Bandolier 18 .


L'Abbé plots


A paper [3] by Kristen L'Abbé and colleagues written over ten years ago is regarded by Bandolier as one of the most sensible and understandable ever written on systematic reviews. The authors suggest a simple graphical representation of the information from trials. Each point on a L'Abbé scatter plot is one trial in the review. The proportion of patients achieving the outcome with the experimental intervention is plotted against the event rate in controls. Even if a review does not show the data in this way, you can do it yourself if the information is in the review, and that's why it's part of Bandolier 's worksheet.

For treatment, trials in which the experimental intervention was better than the control will be in the upper left of the plot, between the Y axis and the line of equality. If experimental was no better than control then the point will fall on the line of equality, and if control was better than experimental then the point will be in the lower right of the plot, between the X axis and the line of equality.

For prophylaxis this pattern will be reversed. Because prophylaxis reduces the number of bad events - such as death after myocardial infarction by the use of aspirin - we expect a smaller proportion harmed with treatment than with control. So if experimental is better than control the trial results cloud should be between the X axis and the line of equality.

These plots give a quick indication of the level of agreement among trials. If the points are in a consistent cloud, that gives some confidence that what we are seeing is a homogeneous effect. But if points are spread all over the graph, and especially if they cross the line of equality, then that should make us concerned about the intervention, or the patients being treated and their condition. This can also be called heterogeneity.

The important point about a L'Abbé plot is that it shows all of the extant data on one piece of paper. When combined with numbers in the trial, and a summary measure like NNT, it is a neat way to summarise lots of information.



Variation in treatment and control


One of the things that using systematic reviews in this way teaches you is just how variable are the effects of both treatment and control in randomised trials. It is legitimate to be surprised, but after a short time it seems that this is the norm.

The reasons for the variability are probably complex, but much will be just random chance. In many circumstances patients can have wide patterns of response to a treatment, but trial size is often relatively small. Gathering data together in systematic review and meta-analysis gives much more power than the single trial in almost all circumstances, and especially for reviews of treatments. Seeing such variability also teaches caution when faced with a single trial.

Size is everything


Take a moment to think about what you want to know about a treatment. You probably want some assurance that it works, but you really want to know how well it works. What do you mean by that? Using NNT terminology, you might want to know that the NNT is within certain limits.

Take ibuprofen. The NNT to obtain at least 50% pain relief in patients with moderate to severe pain over 4-6 hours is about 3 for 400 mg ibuprofen compared with placebo. How close do you want the estimate to be? You probably wouldn't be happy with a 95% confidence interval which went from 1 (perfect) to 10 (rotten). Would you be happy with 2 to 4, or happier still with 2.5 to 3.5?

The answer should be the last of these, but the narrower the confidence interval (the more correct you want the answer to be), the more patients you need to have studied. A mathematical but practical study [4] says that for the confidence interval to be 2.5 to 3.5 we need 500 patients taking ibuprofen and 500 taking placebo. The confidence interval with a single trial of the standard (in pain) of 40 patients per group is 1 to 10. The lesson is to beware the single trial reflex, changing practice on the basis of a single, small, trial. It's quite likely to be wrong. Random chance is in play, and has quite a big effect in small trials, which explains the scatter sometimes seen in L'Abbé plots, and is yet another reason for choosing evidence from systematic reviews.

Worthy or what?


Calculating NNTs is relatively straightforward compared with the greater complexity of deciding whether a trial is credible, or worthy only of the dustbin. It is impossible to be dogmatic, as every subject has its own complexities. Here are some suggestions for your personal checklist:

  • Randomisation: non-randomised trials over-estimate the effect of treatment. Unless there is a compelling reason, you should not believe or read non-randomised trials of treatments.
  • Blinding: unblinded studies over-estimate the effect of treatment. Blinding may be difficult sometimes, so you should treat unblinded studies with extra caution.
  • Withdrawals: studies with large numbers of dropouts should probably be treated with circumspection, unless it makes good sense to you. Size: tiny studies aren't worth your time. Large studies which are well done should carry particular weight.
  • Statistics: does the study do good statistical testing, like analysis of variance? If yes, then that's good, but any study where the authors choose a single positive statistic out of many which are negative should go straight in the bin.
  • Statistical significance: p<0.05 isn't that clever. It's only 1 in 20, and you can roll two sixes with a couple of dice quite often. Weight the p<0.001 much more highly.
  • Credible patient enrolment: just check out whether the patients at entry could demonstrate a change in whatever was being measured.
  • Outcomes: were the outcomes being measured at all valuable to doctors or patients, or were they just unsubstantiated surrogate measures?

Using the toolbox


These are all tools, not rules. We hope you find them useful to look at evidence that comes across your desk, especially when new treatments are being proposed. We would particularly like feedback on the NNT worksheet to know what you liked or disliked.


References:

  1. Cook RJ, Sackett DL. The number needed to treat: a clinically useful measure of treatment effect. British Medical Journal 1995 310:452-4.
  2. DL Sackett, WS Richardson, W Rosenberg, RB Haynes. Evidence-based Medicine: how to practice & teach EBM. Churchill Livingstone. ISBN 0-443-05686-2.
  3. L'Abbé KA, Detsky AS, O'Rourke K. Meta-analysis in clinical research. Ann Intern Med 1987 107:224-33.
  4. RA Moore, D Gavaghan, MR Tramèr, SL Collins, HJ McQuay. Size is everything - large amounts of information are needed to overcome random effects in estimating direction and magnitude of treatment effects. Pain 1998 78: 209-16.

Bandolier's DIY NNT example sheet



A number needed to treat (NNT) is defined by a number of characteristics. This worksheet is designed as an aide memoir for working out NNTs from papers and systematic reviews. First fill in the answers to the questions, where appropriate, graph the data on the L'Abbé plot, and finally do the NNT calculation.
Now graph the percentages for the trial on the graph from the percentages from F and J. This can be done for different outcomes of a trial, or individual trials in a systematic review or meta-analysis.
Now calculate the NNT using the proportions from F and J.
To download a blank version of this worksheet to print and use click here



go to Bandolier 59