Diagnostic testing emerging from the gloom? |
||
|
|
The problem is that there is little evidence to be found at all, and little of that is good news. A succession of stories saying that tests are useless loses impact. Without empirical evidence of bias in study architecture, we are rudderless in the midst of a tidal surge.
Nil desperandum. Help is at hand. Two recent publications have begun to lay a little more foundation and to provide a sea-anchor in this turbulent area.
CARE essay
Bandolier 66 we featured the CARE project (Clinical Assessment of the Reliability of the Examination), a collaborative study of the accuracy and precision of the clinical examination. The Internet address is http://www.carestudy.com/ .
The main plotters behind CARE, Finlay McAlister, Sharon Straus and David Sackett have written a terrific essay on the need for large prospective studies of the clinical examination [1]. This is an important, perhaps seminal paper. More than any other Bandolier has read it explains why new research, indeed, new thinking, is required. It's beautifully written and easy to follow, and is essential reading.
Their prime example is chronic obstructive airways disease (COAD). A systematic review sought physical signs for differentiating patients for those with COAD from those with normal pulmonary function. There were many, but no one sign was found in more than a third of studies.
For each of the the four most commonly used physical signs the range of diagnostic accuracy from the literature was huge. Positive likelihood ratios spanned the range from about 1 to over 10: from useless to highly predictive.
They also examined the quantity and quality of evidence from systematic reviews for a variety of signs for different conditions. There were few high-quality studies, and those there were were small.
The bottom line is that at best we have hand-me-down evidence, and experience. We have little or no objective proof of the quality of diagnostic accuracy of clinical examinations.
Levels of evidence
One description of levels of evidence commonly used is shown below. The keys to good quality are independence, masked comparison with a reference standard, and consecutive patients from an appropriate population. Lower quality comes from inappropriate populations and comparisons that are not masked or with different reference standards. Other standards have been applied to diagnostic tests, as reported in Bandolier 26 .
Levels of evidence for studies of diagnostic methods |
|
| Level | Criteria |
| 1 | An independent, masked comparison with reference standard among an appropriate population of consecutive patients. |
| 2 | An independent, masked comparison with reference standard among non-consecutive patients or confined to a narrow population of study patients. |
| 3 | An independent, masked comparison with an appropriate population of patients, but reference standard not applied to all study patients |
| 4 | Reference standard not applied independently or masked |
| 5 | Expert opinion with no explicit critical appraisal, based on physiology, bench research, or first principles. |
| Study characteristic | Relative diagnostic odds ratio (95% CI) | Description |
| Case-control | 3.0 (2.0 to 4.5) | A group of patients already known to have the disease compared with a separate group of normal patients |
| Different reference tests | 2.2 (1.5 to 3.3) | Different reference tests used for patients with and without the disease |
| Not blinded | 1.3 (1.0 to 1.9) | Interpretation of test and reference is not blinded to outcomes |
| No description test | 1.7 (1.1 to 1.7) | Test not properly described |
| No description of population | 1.4 (1.1 to 1.7) | Population under investigation not properly described |
| No description reference | 0.7 (0.6 to 0.9) | Reference standard not properly described |
| The relative diagnostic odds ratio indicates the diagnostic performance of a test in studies failing to satisfy the methodological criterion relative to its performance in studies with the corresponding feature. | ||