Astrology, illness, and chance
- Study
- Results
- Comment
We
depend a lot on statistical testing to tell us what to think about a result,
and how much weight to put on it, but often forget that statistics (and chance)
are themselves subject to rules. For instance, by setting a 95% confidence
limit on “normal” values, we automatically define 5% of the results
to be “abnormal”. In another example, Bandolier 105 examined the
DICE studies, one showing that rolling dice to simulate trials provides 1 in 20
which were statistically significant at the 5% level (statistical significance
set at a p value of 0.05). This is what one would expect just by chance even
though there was no difference. Another example showed that subgroup analysis
of homogeneous data produced results of spurious high statistical significance.
The
perils of multiple statistical testing might have been drummed into us during
our education, but as researchers we often forget them in the search for
“results”, especially when such testing confirms our pre-existing
biases. A large and thorough examination of multiple statistical tests [1]
underscores the problems this can pose.
Study
This
population-based retrospective cohort study used linked administrative
databases that covered 10.7 million residents of Ontario aged 18-100 years who
were alive and had a birthday in the year 2000. Before any analyses, the
database was split in two to provide both derivation and validation cohorts of
about 5.3 million persons, so that associations found in one cohort could be
confirmed in the other cohort.
All
admissions to Ontario hospitals classified as urgent (but not elective or
planned) was used, using DSM criteria, ranked by frequency. This was used to
determine which persons were admitted within the 365 days following their
birthday in 2000, and the proportion admitted under each astrological sign. The
astrological sign with the highest hospital admission rate was then tested
statistically against the rate for all 11 other signs combined, using a
significance level of 0.05. This was done until two statistically significant
diagnoses were identified for each astrological sign.
Results
In
all 223 diagnoses (accounting for 92% of all urgent admissions) were examined
to find two statistically significant results for each astrological sign. Of
these 223, 72 (32%) were statistically significant for at least one sign
compared with all the others combined. The extremes were Scorpio with two
significant results, and Taurus with 10, with significance levels of 0.0003 to
0.048.
The
two most frequent diagnoses for each sign were used to select 24 significant
associations in the derivation cohort. These included, for instance, intestinal
obstructions and anaemia for people with the astrological sign of Cancer, and
head and neck symptoms and fracture of the humerus for Sagittarius. Levels of
statistical significance ranged from 0.0006 to 0.048, and relative risk from
1.1 to 1.8 (Figure 1), with most being modest.
Figure 1: Relative risk of associations between astrological sign and illness for the 24 chosen associations, using a statistical significance of 0.05, uncorrected for multiple comparisons
Protection
against spurious statistical significance from multiple comparisons was tested
in several ways.
- When the 24 associations were tested in the validation cohort, only two remained significant, gastrointestinal haemorrhage and Leo (relative risk 1.2), and fractured humerus for Sagittarius (relative risk 1.4).
- Preserving an overall error rate of 5% meant using a significance level of 0.002 would have left 9 of 24 comparisons significant in the derivation cohort, but none in the both derivation and validation cohort.
- Correcting for the 14,718 comparisons used in the derivation cohort would have meant using a significance level of 0.000003, and no comparison would have been significant.
Comment
This
study is a sobering reminder that statistical significance can mislead when we
don't use statistics properly: don't blame statistics or statisticians, blame
our use of them. There is no biological plausibility for a relationship between
astrological sign and illness, yet many could be found in this huge data set
when using standard levels of statistical significance without thinking about
the problem of multiple comparisons. Even using a derivation and validation set
did not offer complete protection against spurious results in enormous data sets.
Multiple
subgroup analyses are common in published articles in our journals, usually
without any adjustment for multiple testing. The authors examined 131
randomised trials published in top journals in six months in 2004, which had an
average of 5 subgroup analyses, and 27 significance tests for efficacy and
safety. The danger is that we may react to results that may have spurious
statistical significance, especially when the size of the effect is not large.
Reference:
- PC Austin et al. Testing multiple statistical hypotheses resulted in spurious associations: a study of astrological signs and health. Journal of Clinical Epidemiology 2006 59:964-969.