Log in

In defense of meta-analysis

  • S.I.: Evidence Amalgamation in the Sciences
  • Published:
Synthese Aims and scope Submit manuscript

Abstract

Arguments that medical decision making should rely on a variety of evidence often begin from the claim that meta-analysis has been shown to be problematic. In this paper, I first examine Stegenga’s (Stud Hist Philos Sci Part C Stud Hist Philos Biol Biomed Sci 42:497–507, 2011) argument that meta-analysis requires multiple decisions and thus fails to provide an objective ground for medical decision making. Next, I examine three arguments from social epistemologists that contend that meta-analyses are systematically biased in ways not appreciated by standard epistemology. In most cases I show that critiques of meta-analysis fail to account for the full range of meta-analytic procedures. In the remainder of cases, I argue that the critiques identify problems that do not uniquely cut against meta-analysis. I close by suggesting one reason why it may be pragmatically rational to violate the principle of total evidence and by outlining the criteria for a successful argument against meta-analysis. A set of criteria I contend remain unmet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. What these arguments share is the claim that RCTs fail to provide the theoretical knowledge needed to extrapolate from RCT evidence to a proposed intervention, for discussion of internal differences in these views see Dragulinescu (2012).

  2. Presumably, if a principled reason can be given for one choice over the other, then constraint is not the problem. Trivially, I could “choose” to miscalculate the effect size and reach a wildly different conclusion than others, but no one would think that such a choice would threaten the reliability of meta-analysis. A method lacks constraint only if its proper application fails to yield reasonably similar results.

  3. Active placebos are substances that cause side-effects similar to a drug, but which are not effective treatments.

  4. In other words, the presence or absence of a side-effects alter a patient’s beliefs about whether they are receiving treatment. The third hypothesis proposed that active placebos and “antidepressants” are more effective than inert placebos in causing patients to believe they are receiving treatment, amplifying the placebo effect. In other words, it claims that what we consider to be antidepressants are themselves nothing more than active placebos.

  5. I am grateful to an anonymous reviewer for pushing me on this point.

  6. In order for a result that is consistent with a theory to provide evidential support, it has to arise from a test of that theory (Laudan 1990, pp. 61–65). Even if the data are equally consistent with both the theories put forward by Fountoulakis and Möller (2011) and by Kirsch et al. (2008), the data do not provide evidential support for Fountoulakis and Mollër because their theory was created to be consistent with the data; whereas, Kirsch et. al.’s theory was tested by the data and thus derive evidential support from this and other previous tests.

  7. While this may be true of theory choice, we may nevertheless have non-epistemic reasons to eliminate certain treatment options from ethical medical practice. At some point further human experimentation becomes ethically prohibited because the expectation of benefit becomes too small to justify the expected risks.

  8. The adequacy of inert placebos has previously been brought to bear upon the philosophical disputes over the nature of a placebo (Holman 2015; Howick 2017). Though I have suggested a statistical procedure as an alternative to active placebos, the point here is that no further trials are likely to be conducted that use active placebos as a control.

  9. I use the terms “proponents” and “critics” somewhat cautiously; however, both teams of researchers had previously published papers either supporting or questioning the use of antidepressants. More, over a survey of articles that cite the studies suggests that there may be disciplinary considerations at play. Psychiatrists tend to cite the Quitkin et al. (2000) study in support of the use of antidepressants while psychologists, whose form of therapy can be seen as a market competitor to pharmaceutical treatment, tend to cite Moncrieff et al. (2004). As such it seems that disciplinary allegiance is a fairly good predictor of which study is seen as definitive.

  10. Though unstated, their coding decisions can be reconstructed on a case-by-case basis by comparing the data provided with the original articles. In the quantitative measures it seems that they have used 50% improvement from baseline. However, there does not seem to be a uniform principle that guides the decisions on the various clinical impression scales. In some cases they consider the top two possible ratings as a “responder” while in another, they count only the top response. Another possible scheme for coding is the qualitative descriptions in the studies, but these too are applied inconsistently. For example, in one study a participant that a clinician rated as “moderately improved” was coded as a responder and in another study such a rating was coded as a non-responder.

  11. A number of similar questionable decisions were made, in particular compare Quitkin et al. with the studies from Uhlenhuth and Park (1964), Wilson et al. (1963) and Friedman et al. (1966).

  12. In personal communication, Jukola notes that while tools like PRISMA may resolve some disputes, if quality assessment tools require intractable and arbitrary choices, then there would still be a violation of procedural objectivity. I agree that this is a possibility, but leave it as an open question as to whether such choices are required, and if so, whether they regularly lead to divergent conclusions. I revisit this point in Sect. 3.

  13. Formally, from the central limit theorem it can be shown that the error of the estimate decreases as the sample size increases. Thus, small trials will show a large range of effect sizes.

  14. The graphs were created by extracting data from online supplementary material (Turner et al. 2008). The sample sizes are taken from Appendix Table A and the effect sizes from Appendix Table C. For studies containing multiple doses, the sample size is the sum of all the sub-groups, the effect size is a weighted average of the sub-group effects. Note that Table 2 is not simply Table 1 with the unpublished studies removed. The published studies also differed in other ways such as which outcome measures were reported (or suppressed). Nevertheless the effect of publication bias is still plain in the reported data.

  15. This is not invariably so; in the next section I discuss a threat to the reliability of meta-analysis that cannot be statistically corrected for.

  16. The agents in our model conduct a series of experiments testing the efficacy of a treatment and update their beliefs on the basis of both their experiments, and the results of their peers. Functionally speaking the agents in our model are Bayesian agents, but the manner in which the agents update approaches the result of a frequentist meta-analysis quite quickly given the weak initial strength of the agents’ priors. Though our concern is primarily with the ability of industry-funding to promote biased measurements, a straightforward corollary of our results is that meta-analyses will yield biased estimates in some circumstances. As noted below, though they develop the concern in less detail, both Jukola (2017) and Stegenga (2015) explicitly cite problems with measurements in devaluing arguments.

  17. An example of a retrospective analysis is provided by DES, one of the largest drug disaster of the late 1970s. Though it was widely prescribed between 1950 and 1970 to prevent miscarriage, reliance on meta-analytic evidence would have prevented its use after 1954 (Bamigboye and Morris 2003). For additional cases used explicitly in an argument for meta-analysis see Chalmers (2005). It might be objected that though such case series begin to establish a track record, what is strictly demanded is comprehensive analysis of all treatments; otherwise, it is possible that supporting instances have been cherry-picked to provide supporting evidence. This enormous undertaking is exactly what was provided, sub-discipline by sub-discipline, in The Oxford Database of Perinatal Trials, Effective Care in Pregnancy and Childbirth, A Guide to Effective Care in Pregnancy and Childbirth, and Effective Care of the Newborn Infant. Others would subsequently follow (see Chalmers et al. (1997) for important milestones).

References

  • Antonuccio, D., Burns, D., & Danton, W. (2002). Antidepressants: A triumph of marketing over science? Prevention and Treatment, 5, 25.

    Article  Google Scholar 

  • Antonuccio, D., Danton, W., DeNelsky, G., Greenberg, R., & Gordon, J. (1999). Raising questions about antidepressants. Psychotherapy and Psychosomatics, 68, 3–14.

    Article  Google Scholar 

  • Bamigboye, A. A., & Morris, J. (2003). Oestrogen supplementation, mainly diethylstilbestrol, for preventing miscarriages and other adverse pregnancy outcomes. Cochrane Database of Systematic Reviews, 2003(3), CD004353.

  • Biddle, J. (2013). State of the field: Transient underdetermination and values in science. Studies in History and Philosophy of Science Part A, 44, 124–133.

    Article  Google Scholar 

  • Broadbent, A. (2011). Inferring causation in epidemiology: Mechanisms, black boxes, and contrasts. In P. Illari McKay, F. Russo, & J. Williamson (Eds.), Causality in the sciences (pp. 45–69). Oxford: Oxford University Press.

  • Brown, W. A. (2002). Are antidepressants as ineffective as they look? Prevention and Treatment, 5, 24c.

    Article  Google Scholar 

  • Cartwright, N. (2009). What is this thing called “efficacy”? In C. Mantzavinos (Ed.), Philosophy of the social sciences: Philosophical theory and scientific practice (pp. 185–206). Cambridge: Cambridge University Press.

    Chapter  Google Scholar 

  • Cartwright, N. (2011). A philosopher’s view of the long road from RCTs to effectiveness. The Lancet, 377, 1400–01.

    Article  Google Scholar 

  • Chalmers, I. (1991). Electronic publication of continuously updated overviews (meta-analyses) of controlled trials. International Society of Drug Bulletins Review, 1, 15–18.

    Google Scholar 

  • Chalmers, I. (2005). The scandalous failure of scientists to cumulate scientifically. In Abstract to paper presented at: Ninth World Congress on Health Information and Libraries (pp. 20–23).

  • Chalmers, I., Sackett, D., & Silagy, (1997). In A. Maynard & I. Chalmers (Eds.), Non-random Reflections on Health Services research: On the 25th Anniversary of Archie Cochrane’s Effectiveness and Efficiency (pp. 231–249). London: BMJ Publishing.

  • Clarke, B., Gillies, D., Illari, P., Russo, F., & Williamson, J. (2014). Mechanisms and the evidence hierarchy. Topoi, 33, 339–360.

    Article  Google Scholar 

  • Clarke, M., Hopewell, S., & Chalmers, L. (2007). Reports of clinical trials should begin and end with up-to- date systematic reviews of other relevant evidence: A status report. Journal of the Royal Society of Medicine, 100, 187–190.

    Article  Google Scholar 

  • Cochrane, A. L. (1972). Effectiveness and efficiency: Random reflections on health services. Oxford: Oxford University Press.

    Google Scholar 

  • Dragulinescu, S. (2012). On ‘stabilising’ medical mechanisms, truth-makers and epistemic causality: A critique to Williamson and Russo’s approach. Synthese, 187, 785–800.

    Article  Google Scholar 

  • Elias, M. (2002) Study: Antidepressant barely better than placebo. USA Today. https://usatoday30.usatoday.com/news/health/drugs/2002-07-08-antidepressants.htm.

  • Fergusson, D., Glass, K. C., Hutton, B., & Shapiro, S. (2005). Randomized controlled trials of aprotinin in cardiac surgery: Could clinical equipoise have stopped the bleeding? Clinical Trials, 2, 218–232.

    Article  Google Scholar 

  • Fountoulakis, K. N., & Möller, H. J. (2011). Efficacy of antidepressants: A re-analysis and re-interpretation of the Kirsch data. International Journal of Neuropsychopharmacology, 14, 405–412.

    Article  Google Scholar 

  • Friedman, A., Granick, S., Cohen, H., & Cowitz, B. (1966). Imipramine (Tofranil) vs. placebo in hospitalised psychotic depressives. Journal of Psychiatric Research, 4, 13–36.

    Article  Google Scholar 

  • Furberg, C. D. (1983). Effect of antiarrhythmic drugs on mortality after myocardial infarction. The American Journal of Cardiology, 52(6), C32–C36.

    Article  Google Scholar 

  • Gilbert, R., Salanti, G., Harden, M., & See, S. (2005). Infant slee** position and the sudden infant death syndrome: Systematic review of observational studies and historical review of recommendations from 1940 to 2002. International Journal of Epidemiology, 34, 874–887.

    Article  Google Scholar 

  • Goldman, A. (1999). Knowledge in a social world. New York, NY: Oxford University Press.

    Book  Google Scholar 

  • Grim, P., Rosenberger, R., Rosenfeld, A., Anderson, B., & Eason, R. E. (2013). How simulations fail. Synthese, 190, 2367–2390.

    Article  Google Scholar 

  • Healy, D. (2012). Pharmageddon. Berkeley: University of California Press.

    Google Scholar 

  • Hergovich, A., Schott, R., & Burger, C. (2010). Biased evaluation of abstracts depending on topic and conclusion: Further evidence of a confirmation bias within scientific psychology. Current Psychology, 29, 188–209.

    Article  Google Scholar 

  • Hine, L. K., Laird, N., Hewitt, P., & Chalmers, T. C. (1989). Meta-analytic evidence against prophylactic use of lidocaine in acute myocardial infarction. Archives of Internal Medicine, 149, 2694–2698.

    Article  Google Scholar 

  • Hollister, L., Overall, J., Johnson, M., Pennington, V., Katz, G., & Shelton, J. (1964). Controlled comparison of Imipramine, Amitriptyline and placebo in hospitalised depressed patients. Journal of Nervous and Mental Disease, 139, 370–375.

    Article  Google Scholar 

  • Holman, B. (2015). Why most sugar pills are not placebos. Philosophy of Science, 82, 1330–1343.

    Article  Google Scholar 

  • Holman, B. (2017). Philosophers on drugs. Synthese. https://doi.org/10.1007/s11229-017-1642-2.

  • Holman, B., & Bruner, J. (2017). Experimentation by industrial selection. Philosophy of Science, 84, 1008–1019.

    Article  Google Scholar 

  • Horwitz, R. I., & Feinstein, A. R. (1981). Improved observational method for studying therapeutic efficacy: Suggestive evidence that lidocaine prophylaxis prevents death in acute myocardial infarction. JAMA, 246, 2455–2459.

    Article  Google Scholar 

  • Howick, J. (2012). The philosophy of evidenced-based medicine. West Sussex: British Medical Journal Books.

    Google Scholar 

  • Howick, J. (2017). The relativity of ‘placebos’: Defending a modified version of Grünbaum’s definition. Synthese, 194, 1363–1396.

    Article  Google Scholar 

  • Jukola, S. (2015). Meta-analysis, ideals of objectivity, and the reliability of medical knowledge. Science and Technology Studies, 28, 101–120.

    Google Scholar 

  • Jukola, S. (2017). On ideals of objectivity, judgment, and bias in medical research-A comment on Stegenga. Studies in History and Philosophy of Science Part C: Studies in History and Philosophy of Biology and Biomedical Sciences, 62, 35–41.

    Article  Google Scholar 

  • Kirsch, I. (1998a). Reducing noise and hearing placebo more clearly. Prevention and Treatment, 1, 7r.

    Google Scholar 

  • Kirsch, I. (1998b). On the importance of reading carefully: A response to Klein. Treatment and Prevention, 1, 9r.

    Google Scholar 

  • Kirsch, I., Deacon, B. J., Huedo-Medina, T. B., Scoboria, A., Moore, T. J., & Johnson, B. T. (2008). Initial severity and antidepressant benefits: A meta-analysis of data submitted to the Food and Drug Administration. PLoS Medicine, 5, e45.

  • Kirsch, I., Moore, T. J., Scoboria, A., & Nicholls, S. S. (2002). The emperor’s new drugs: An analysis of antidepressant medication data submitted to the U.S. Food and Drug Administration. Prevention and Treatment, 5, 23.

  • Kirsch, I., Scoboria, A., & Moore, T. (2002). Antidepressants and placebos: Secrets, revelations, and unanswered questions. Prevention and Treatment, 5, 33.

    Article  Google Scholar 

  • Kirsch, I., & Sapirstein, G. (1998). Listening to Prozac but hearing placebo: A meta-analysis of antidepressant medication. Prevention and Treatment, 1, 2a.

    Google Scholar 

  • Kitcher, P. (1993). The advancement of science: Science without legend, objectivity without illusions. Oxford: Oxford University Press.

    Google Scholar 

  • Klein, D. F. (1998a). Listening to meta-analysis but hearing bias. Prevention and Treatment, 1, 6c.

    Google Scholar 

  • Klein, D. F. (1998b). Reply to Kirsch’s rejoinder regarding antidepressant meta-analysis. Treatment and Prevention, 1, 8r.

    Google Scholar 

  • Koehler, J. J. (1993). The influence of prior beliefs on scientific judgments of evidence quality. Organizational Behavior and Human Decision Processes, 56, 23–55.

    Article  Google Scholar 

  • Kourany, J. A. (2010). Philosophy of science after feminism. New York, NY: Oxford University Press.

    Book  Google Scholar 

  • Lakatos, I. (1978). The methodology of scientific research programmes Volume 1: Philosophical papers (Vol. 1). Cambridge: Cambridge University Press.

  • Landes, J., Osimani, B., Poellinger, R. (Forthcoming). Epistemology of Causal inference in Pharmacology Towards a Framework for the Assessments of harm. European Journal of Philosophy of Science.

  • Laudan, L. (1990). Science and relativism: Some key controversies in the philosophy of science. Chicago, IL: University of Chicago Press.

    Book  Google Scholar 

  • Liberati, A., Altman, D. G., Tetzlaff, J., Mulrow, C., Gøtzsche, P. C., Ioannidis, J. P., et al. (2009). The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: Explanation and elaboration. PLoS Medicine, 6(7), e1000100.

    Article  Google Scholar 

  • Longino, H. E. (1990). Science as social knowledge: Values and objectivity in scientific inquiry. Princeton, NJ: Princeton University Press.

    Google Scholar 

  • Longino, H. E. (2002). The fate of knowledge. Princeton, NJ: Princeton University Press.

    Book  Google Scholar 

  • Lord, C. G., Ross, L., & Lepper, M. R. (1979). Biased assimilation and attitude polarization: The effects of prior theories on subsequently considered evidence. Journal of Personality and Social Psychology, 37, 2098.

    Article  Google Scholar 

  • Moerman, D. E. (2002). The loaves and the fishes: A comment on The emperor’s new drugs: An analysis of antidepressant medication data submitted to the US Food and Drug Administration. Prevention and Treatment, 5, 29.

  • Moher, D., Tetzlaff, J., Tricco, A. C., Sampson, M., & Altman, D. G. (2007). Epidemiology and reporting characteristics of systematic reviews. PLoS Med, 4, e78.

    Article  Google Scholar 

  • Mahoney, M. J. (1977). Publication prejudices: An experimental study of confirmatory bias in the peer review system. Cognitive Therapy and Research, 1, 161–175.

    Article  Google Scholar 

  • Moncrieff, J. (2001). Are antidepressants overrated? A review of methodological problems in antidepressant trials. The Journal of Nervous and Mental Disease, 189, 288–295.

    Article  Google Scholar 

  • Moncrieff, J., Wessely, S., & Hardy, R. (1998). Meta-analysis of trials comparing antidepressants with active placebos. The British Journal of Psychiatry, 172, 227–231.

    Article  Google Scholar 

  • Moncrieff, J., Wessely, S., & Hardy, R. (2004). Active placebos versus antidepressants for depression. New York City: The Cochrane Library.

    Book  Google Scholar 

  • Montgomery, S. A. (1994). Clinically relevant effect sizes in depression. European Neuropsychopharmacology, 4, 283–284.

    Article  Google Scholar 

  • Moore, T. (1995). Deadly medicines: Why tens of thousands of heart patients died in America’s worst drug disaster. New York, NY: Simon and Schuster.

    Google Scholar 

  • Murray, E. (1989). Measurement issues in the evaluation of psychopharmacological therapy. In S. Fisher & R. P. Greenberg (Eds.), The limits of biological treatments for psychological distress (pp. 39–68). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Quitkin, F. M., Rabkin, J. G., Gerald, J., Davis, J. M., & Klein, D. F. (2000). Validity of clinical trials of antidepressants. American Journal of Psychiatry, 157, 327–337.

    Article  Google Scholar 

  • Romero, F. (2016). Can the behavioral sciences self-correct? A social epistemic study. Studies in History and Philosophy of Science Part A, 60, 55–69.

    Article  Google Scholar 

  • Russo, F., & Williamson, J. (2007). Interpreting causality in the health sciences. International Studies in the Philosophy of Science, 21, 157–170.

    Article  Google Scholar 

  • Russo, F., & Williamson, J. (2011). Epistemic causality and evidence-based medicine. History and Philosophy of the Life Sciences, 33, 563–581.

    Google Scholar 

  • Salamone, J. D. (2002). Antidepressants and placebos: Conceptual problems and research strategies. Prevention and Treatment, 5, 24c.

    Article  Google Scholar 

  • Senn, S. (2003). Disappointing dichotomies. Pharmaceutical Statistics, 2, 239–240.

    Article  Google Scholar 

  • Senn, S. (2015). Mastering variation: Variance components and personalised medicine. Statistics in Medicine, 35, 966–977.

    Article  Google Scholar 

  • Sklar, L. (1975). Methodological conservatism. Philosophical Review, 84, 384–400.

    Article  Google Scholar 

  • Stanford, K. (2006). Exceeding our grasp: Science, history, and the problem of unconceived alternatives. New York: Oxford University Press.

    Book  Google Scholar 

  • Starr, M., Chalmers, I., Clarke, M., & Oxman, A. D. (2009). The origins, evolution, and future of the cochrane database of systematic reviews. International Journal of Technology Assessment in Health Care, 25(S1), 182–195.

    Article  Google Scholar 

  • Stegenga, J. (2011). Is meta-analysis the platinum standard of evidence? Studies in History and Philosophy of Science Part C: Studies in History and Philosophy of Biological and Biomedical Sciences, 42, 497–507.

    Article  Google Scholar 

  • Stegenga, J. (2015). Measuring effectiveness. Studies in History and Philosophy of Science Part C: Studies in History and Philosophy of Biological and Biomedical Sciences, 54, 62–71.

    Article  Google Scholar 

  • Stegenga, J., Graham, A., Kennedy, S. T., Jukola, S., & Bluhm, R. (2016). New directions in philosophy of medicine. The Bloomsbury Companion to Contemporary Philosophy of Medicine, 343, 23.

    Google Scholar 

  • Thase, M. E. (2002). Antidepressant effects: The suit may be small, but the fabric is real. Prevention and Treatment, 5, 32c.

    Article  Google Scholar 

  • Turner, E., Matthews, A., Linardatos, E., Tell, R., & Rosenthal, R. (2008). Selective publications of antidepressant trials and its influence on apparent efficacy. New England Journal of Medicine, 358, 252–260.

    Article  Google Scholar 

  • Uhlenhuth, E., & Park, L. (1964). The influence of medication (Imipramine) and doctor in relieving depressed psychoneurotic outpatients. Journal of Psychiatric Research, 2, 101–122.

    Article  Google Scholar 

  • van Assen, M. A., van Aert, R., & Wicherts, J. M. (2015). Meta-analysis using effect size distributions of only statistically significant studies. Psychological Methods, 20, 293.

    Article  Google Scholar 

  • van Aert, R. C., Wicherts, J. M., & van Assen, M. A. (2016). Conducting meta-analyses based on p values: Reservations and recommendations for applying p-uniform and p-curve. Perspectives on Psychological Science, 11, 713–729.

    Article  Google Scholar 

  • Wilson, I., Vernon, J., Guin, T., & Sandifer, M. (1963). A controlled study of treatment of depression. Journal of Neuropsychiatry, 4, 331–337.

    Google Scholar 

Download references

Acknowledgements

This paper grew out of an extended conversation with Jacob Stegenga, indeed the first draft of Sect. 1 was completed at his dining room table. I owe him a debt of gratitude for housing me during my time in Cambridge and for a number of lengthy discussions on the topic (which of course is not to saddle him with either the views expressed in the paper or any remaining errors). I also would like to thank Irving Kirsch and Saana Jukola for responding to queries and hel** me clarify my understanding of their work. Finally, I would to thank the two blind reviewers for their comments on this paper. One reviewer in particular forced me to resolve central ambiguities in the argument that I sensed were problematic, but couldn’t see my way around without their framing of the problem. The paper improved immensely because of their challenging and insightful objections.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bennett Holman.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Holman, B. In defense of meta-analysis. Synthese 196, 3189–3211 (2019). https://doi.org/10.1007/s11229-018-1690-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11229-018-1690-2

Keywords

Navigation