Skip navigation

Describing results of trials and reviews

Most of the outputs that we use for reporting trials and reviews have their origins in epidemiology, the world where we look for small effects in large populations - things like aspirin after a heart attack, or reducing cholesterol. Most of the activity of medicine is conversely about large effects in small populations, like hip replacements for osteoarthritic joints, or anaesthesia, or pain relief, or antibiotics for infection. So our example is one most of us will be familiar with.

Table 1 is a hypothetical trial of ibuprofen in acute pain. Not worrying too much at this stage about any other features or even the result itself, we will use this trial to present some of the more common definitions for presentation of results where information is available in dichotomous form. Dichotomous means the patient had the outcome or did not, and we have the numbers for each. In this trial, for instance, 22 of 40 patients given ibuprofen had adequate pain relief compared with only 7 of 40 given placebo. The term experimental event rate (EER) is used to describe the rate that good events occur with ibuprofen (22/40, or 55%) and control event rate (CER) to describe the rate that good events occur with placebo (7/40 or 18%).


Odds ratios

This Table shows first how to compute odds. Odds refers to the ratio of the number of people having the good event to the number not having the good event, so the experimental event odds are 22/18 or 1.2. The odds ratio is the ratio of the odds with experimental treatment and that of control, or here 1.2/0.21 = 5.7. There are lots of different ways of computing odds ratios that give slightly different answers in different circumstances. Values greater than 1 show that experimental is better than control, and if a 95% confidence interval is calculated, statistical significance is assumed if the interval does not include 1.

Some would change this around and compute the odds ratios from the point of view of the patients not having adequate pain relief. The experimental event odds would be 18/22 or 0.82, and the control event odds would be 33/7 or 4.7. The odds ratio then would be 0.82/4.7 = 0.17.

For ibuprofen versus placebo the odds ratio is 5.7 or 0.17. Pick the bones out of that. How would you use that, other than knowing that an odds ratio that was far from 1 meant that ibuprofen was better than placebo.

Table 1: Results of hypthetical randomised trial

Treatment
Total number of patients treated
Number who achieved at least 50% pain relief
Number who did not achieve at least 50% pain relief
Ibuprofen 400 mg
40
22
18
Placebo
40
7
33
Calculations made from these results
Experimental event rate (EER, event rate with ibuprofen)
22/40 = 0.55 or 55%
Control event rate (CER, event rate with placebo)
7/40 = 0.18 or 18%
Experimental event odds
22/18 = 1.2
Control event odds
7/33 = 0.21
Odds ratio
1.2/0.21 = 5.7
Relative risk (EER/CER)
0.55/0.18 =3.1
Relative risk increase (100(EER-CER)/CER ) as a percentage
100((0.55-0.18)/0.18) = 206%
Absolute risk increase or reduction (EER-CER)
0.55 - 0.18 = 0.37 (or 37%)
NNT (1/(EER-CER))
1/(0.55 - 0.18) = 2.7

Relative risk or benefit

Relative risk is a bit easier on the brain. It is simply the ratio of EER to CER, here 0.55/0.18 (or 55/18 for percentages), and is 3.1. Again values greater than 1 show that experimental is better than control, and if a 95% confidence interval is calculated, statistical significance is assumed if the interval does not include 1. Odds ratios and relative risk often give the same numerical value when they are low, but not when high. There is disagreement between eminent statisticians about which of these is best. We use relative risk, but wouldn't pick a fight with someone who preferred odds ratios.

Again, knowing that the relative risk is 3.1 is not intuitively useful. Both relative risk and odds ratio are important ways of ensuring that there is statistical significance in our result. Unless there is statistical significance, we should not be using a treatment except in exceptional circumstances. So whatever else we do in the way of data manipulation, one or other of these tests has primacy for giving us the right to move on.

Relative risk reduction or increase

The relative risk reduction is the difference between the EER and CER (EER-CER) divided by the CER, and usually expressed as a percentage. In Table 1 the relative risk increase is 206%. If the number of events is smaller with treatment, then the relative risk reduction is calculated by subtracting the CER from EER in the equation.

Absolute risk increase or reduction

If we subtract the CER from the EER (EER-CER) then we have the absolute risk increase (ARI), the effect due solely to ibuprofen, and nothing else. The language here doesn't quite work because it was originally taken from the world of epidemiology where reducing risk (cholesterol lowering etc) is all. The absolute risk reduction (ARR) is CER-EER, when events occur more often with control than they do with treatment.

Number needed to treat (NNT)

For every 100 patients with acute pain treated with ibuprofen, 37 (55 - 18) will have adequate pain relief because of the ibuprofen we have given them. Clearly then, we have to treat 100/37, or 2.7 patients with ibuprofen for one to benefit because of the ibuprofen they have been given. That's what NNT is (Table 1). This has immediate clinical relevance because we immediately know what clinical and other effort is being made to produce one result with a particular intervention.

The best NNT would be 1, where everyone got better with treatment and nobody got better with control, and NNTs close to 1 can be found with antibiotic treatments for susceptible organisms, for instance. Higher NNTs represent less good treatment, and the NNT is a useful tool for comparing two similar treatments. When doing so the NNT must always specify the comparator (e.g., placebo, no treatment, or some other treatment), the therapeutic outcome, and the duration of treatment necessary to achieve that outcome. If these are different, you probably should not be comparing NNTs. It is also worth mentioning that prophylactic interventions that produce small effects in large numbers of patients will have high NNTs, perhaps 20-100. Just because an NNT is large does not mean it will not be a useful treatment.

We can use the same methods for adverse events, when numbers needed to treat become numbers needed to harm (NNH). Here small numbers are bad (more frequent harm) and larger numbers good. When making comparisons between treatments, the same provisos apply as for NNT, especially that for definition.

For both NNT and NNH we should recognise that we are working with an unusual scale which runs from 1 (everyone has outcome with treatment and none with control) to -1 (no-one has outcome with treatment and everyone has it with control), with infinity as the mid point where we divide by zero when EER equals CER. Once NNTs are NNHs are much above 10 the upper confidence interval gets closer to infinity and the upper and lower intervals look unbalanced.

Other outputs

There are masses of other outputs that people use for trials and epidemiological studies. These include effect size, relative risk reductions and so on. We don't find these useful, but there will always be circumstances in which they are the appropriate outputs.