Class (noun); any set of people or things grouped together or differentiated from
others. An increasingly asked question is that of whether a set of drugs forms a
class, and whether there is a 'class effect'. Class effect is usually taken to mean
similar therapeutic effects and similar adverse effects, both in nature and extent.
If such a 'class effect' exists, then it makes decision-making easy: you choose the
cheapest.
Criteria for drugs to be grouped together as a class involve some or all of the
following:
- Drugs with similar chemical structure
- Drugs with similar mechanism of action
- Drugs with similar pharmacological effects
Declaring a class effect requires a bit of thought, though. How much thought, and of
what type, has been considered in one of that brilliant JAMA series on users guides
to the medical literature [1]. No one should declare a class effect and choose the
cheapest without reference to the rules of evidence set out in this paper.
Levels of evidence for efficacy
These are shown in Table 1, though if it comes down to levels 3 and 4 evidence for
efficacy, the ground is pretty shaky. Level 1 evidence is what we always want and
almost always never get, the large randomised head to head comparison. By the time
there are enough compounds around to form a class, there is almost no organisation
interested in funding expensive, new, trials to test whether A is truly better than
B.
Table 1: Levels of evidence for efficacy for class effect
|
Level
|
Comparison
|
Patients
|
Outcomes
|
Criteria for validity
|
| 1 |
RCT direct comparison |
Identical |
Clinically important |
Randomisation concealment
Complete follow up
Double-blinding
Outcome assessment must be sound |
| 2 |
RCT direct comparison |
Identical |
Valid surrogate |
Level 1 plus
Validity of surrogate outcome |
| 2 |
Indirect comparison with placebo from RCTs |
Similar or different in disease severity or risk |
Clinically important or valid surrogate |
Level 1 plus
Differences in methodological quality
End points
Compliance
Baseline risk |
| 3 |
Subgroup analyses from indirect comparisons of RCTs
with placebo |
Similar or different in disease severity or risk |
Clinically important or valid surrogate |
Level 1 plus
Multiple comparisons, post hoc data dredging
Underpowered subgroups
Misclassification into subgroups |
| 3 |
Indirect comparison with placebo from RCTs |
Similar or different in disease severity or risk |
Unvalidated surrogate |
Surrogate outcomes may not capture all good or bad
effects of treatment |
| 4 |
Indirect comparison of nonrandomised studies |
Similar or different in disease severity or risk |
Clinically important |
Confounding by indication, compliance, or time
Unknown or unmeasured confounders
Measurement error
Limited database, or coding systems not suitable for research |
Most of the time we will be dealing with randomised trials of A versus placebo
or standard treatment and B versus placebo or standard treatment. This will be
level 2 evidence based on clinically important outcomes (a healing event) or
validated surrogate outcomes (reduction of cholesterol with a statin). So
establishing a class effect will likely involve quality systematic review or
meta-analysis of quality randomised trials.
What constitutes quality in general is captured in Table 1, though there will be
some situation-dependent factors. One thing missing from Table 1 is size. There
probably needs to be some prior estimate of how many patients or events
constitutes a reasonable number for analysis.
Levels of evidence for safety
These are shown in Table 2. There are always going to be problems concerning
rare, but serious, adverse events. The inverse rule of three tells us that if we
have seen no serious adverse events in 1500 exposed patients, then we can be 95%
sure that they do not occur more frequently than 1 in 500 patients.
Table 2: Levels of evidence for safety for class effect
|
|
Level
|
Type of study
|
Advantages
|
Criteria for validity
|
| 1 |
RCT |
Only design that permits detection of adverse
effects when the adverse effect is similar to the event the treatment is trying
to prevent |
Underpowered for detecting adverse events unless
specifically designed to do so |
| 2 |
Cohort |
Prospective data collection, defined cohort |
Critically depends on follow up, classification and
measurement accuracy |
| 3 |
Case-control |
Cheap and usually fast to perform |
Selection and recall bias may provide problems, and
temporal relationships may not be clear. |
| 4 |
Phase 4 studies |
Can detect rare but serious adverse events if large
enough |
No control or unmatched control
Critically depends on follow up, classification and measurement accuracy |
| 5 |
Case series |
Cheap and usually fast |
Often small sample size, selection bias may be a
problem, no control group |
| 6 |
Case report(s) |
Cheap and usually fast |
Often small sample size, selection bias may be a
problem, no control group |
Randomised trials of efficacy will usually be underpowered to detect rate,
serious adverse events, and we will usually have to use other study designs. In
practice the difficulty will be that soon after new treatments are introduced
there will be a paucity of data for these other types of study. Only rarely will
randomised trials powered to detect rare adverse events be conducted.
Most new treatments are introduced after being tested on perhaps a few thousand
patients in controlled trials. Caution is needed in treatments for chronic
conditions, especially difficult if trials are only short-term and where other
diseases and treatments are likely.
Compliance
A difficult issue this, with a fragmented literature. But we do know that while
compliance is usually high in clinical trials it may be lower in practice.
Treatment schedules that are likely to improve compliance (once a day, for
instance) might be important.
Cost
Economic studies are complicated beasts, and we need to treat this evidence with
caution. Assumption of a class effect is usually done to justify choosing the
cheapest drug in terms of acquisition (prescribing) costs. Terrific if this means
that the costs of achieving the same ends are minimised. It may not be like that,
and health economics in class effects need to be carefully thought through.
Comment
This paper uses statins as an example, with a decision being taken by clinician
and policymaker between older, more expensive statins, and newer, cheaper,
statins. Tactfully one chooses the cheaper statin with less information, and the
other the older and more expensive statin with masses of patient experience. Can
you guess who chose what?
Bandolier 47 examined the evidence for some of the older statins, with up to
27,000 years of patient experience and made the point that weight of evidence
should be as important as acquisition cost. Having this paper to hand at the time
would have been a great help.
Equivalence
McAlister & Sackett extend their thoughts on class effects to the particular
example of equivalence trials, and provide some useful guides about what features
of equivalence trials are important in determining their validity [2]. The
intellectual problem with equivalence (A versus B) trials is that the same result
is consistent with three conclusions:
- Both A and B are equally effective
- Both A and B are equally ineffective
- Trials inadequate to detect differences between A and B
To combat the problems posed by the latter two conclusions, McAlister &
Sackett suggest several criteria in addition to those used for superiority trials
(A and/or B versus placebo). These are shown in Table 3.
Table 3: Evidence quality for superiority and active-control equivalence
trials
|
|
Superiority trials
|
Active-control equivalence trials
|
| Randomised allocation |
Randomised allocation |
| Randomisation concealed |
Randomisation concealed |
| All patients randomised accounted
for |
All patients randomised accounted for |
| Intention to treat analysis |
Intention to treat analysis
and on-treatment analysis
|
| Clinicians and patients blinded to
treatment received |
Clinicians and patients blinded to treatment
received |
| Groups treated equally |
Groups treated equally |
| Groups identical at baseline |
Groups identical at baseline |
| Clinically important outcomes |
Clinically important outcomes |
|
|
Active control previously shown to be effective |
|
|
Patients and outcomes similar to trials previously
showing efficacy |
|
|
Both regimens applied in an optimal fashion |
|
|
Appropriate null hypothesis tested |
|
|
Equivalence margin pre-specified |
| Trial of sufficient size |
Trial of sufficient size |
Control shown previously to be effective?
Ideally documented in a systematic review of placebo controlled trials with
benefits on active drug exceeding a clinically important effect. Without this
information both may be equally ineffective.
Patients and outcomes similar to original trials?
Obvious, this one. If they are not, then any conclusion about equivalence is
doomed. Beware, though, trials designed to show equivalent efficacy being used to
demonstrate differences in harm or toxicity, for which they were not powered.
Regimens applied in identical fashion?
The most common example is that of choosing the best dose of A versus an
ineffective dose of B (no names, no pack drill, but no prizes for picking out
numerous examples especially from pharmaceutical company sponsored trials showing
'our drug is better than yours'). Should be OK if licensed doses are chosen.
Other pitfalls to look out for are low compliance or frequent treatment changes,
incomplete follow up, disproportionate use of cointerventions and lack of
blinding.
Appropriate statistical analysis?
Equivalence trials are designed to rule out meaningful differences between two
treatments. Often one-sided tests of difference are used. Lack of significant
superiority is not necessarily the same as defining an appropriate level of
equivalence and testing for it.
Intention to treat analysis confers the risk of making a false-negative
conclusion that treatments have the same efficacy when they do not. In
equivalence trials the conservative approach may be to compare patients actually
on treatment. Both analyses should probably be used.
Prespecified equivalence margin?
How different is different? Equivalence trials should have a prior definition of
how big a difference is a difference, and justify it. Even more than that, they
have to convince you that the lack of that difference means that treatments
would, in fact, be equivalent.
Size?
Most equivalence trials do not have enough power to detect even a 50% difference
between treatments, and a 1994 review [3] found that 84% were too small to detect
a 25% difference. Size is everything when we want to show no difference, and the
smaller the difference that is important, the larger the trial has to be.
Comment
McAlister & Sackett apply their methodological criteria to four large
equivalence trials in hypertension. All had failings, and none could detect a 10%
difference between treatments. Readers of equivalence trials should beware.
Designating a class effect on a group of drugs, and judging them to be
equivalent on inadequate evidence is something most of us do at some time or
another. Because prescribing costs often drive decisions, 'cheapest is best'
thinking often applies. Much of the time we will make incorrect decisions, but
fortunately won't have the evidence to know that we are wrong. This is important
and tricky territory that needs more work.
References:
- FA McAlister et al. Users' guides to the medical literature XIX Applying
clinical trial results B. Guidelines for determining whether a drug is exerting
(more than) a class effect. JAMA 1999 282: 1371-1377.
- FA McAlister & DL Sackett. Active-control equivalence trials and
antihypertensive agents. American Journal of Medicine 2001 111: 553-558.
- D Moher et al. Statistical power, sample size, and their reporting in
randomized controlled trials. JAMA 1994 272: 122-124.
|
previous
or
next
story in this issue