Log in

Towards more credible conceptual replications under heteroscedasticity and unbalanced designs

  • Published:
Quality & Quantity Aims and scope Submit manuscript

Abstract

Theory cannot be fully validated unless original results have been replicated, resulting in conclusion consistency. Replications are the strongest source of evidence to verify research findings and knowledge claims. In the social sciences, replication studies often fail and thus a continuing need for replication studies to confirm tentative facts, expand knowledge to gain new understanding, and verify hypotheses. Failure to replicate in the social and behavioral sciences sometimes arises due to dissimilarity between hypotheses formulated in original and replication studies. Alternatively, failure to replicate also occurs when the same hypothesis is tested; but done so in the absence of knowledge from previous investigations, as when original study effect sizes are not considered in replication studies. To increase replicability of research findings, this paper demonstrates that the application of two one-sided tests to evaluate a replication question provides a superior means for conducting replications, assuming all other methodological procedures remained as similar as possible. Furthermore, this paper sought to explore the impact of heteroscedasticity and unbalanced designs in replication studies in four paired conditions of variance and sample size. Two Monte Carlo simulations, each with two stages, were conducted to investigate conclusion consistency among different replication procedures to determine the repeatability of an observed effect. Overall, the proposed approach yielded a higher proportion of successful replications than the conventional approach (testing the original null hypothesis of no effect). Thus, findings can be confirmed by replications and in the absence of confirmation, there cannot be a final statement about any theory.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Both superiority and non-inferiority tests are special cases of one-sided tests. Under a superiority test, rejecting the null hypothesis implies that a treatment mean is greater than a reference mean \(\left(\delta ={\mu }_{A}-{\mu }_{B}\right)\), by more than the superiority margin \(\left({M}_{s}\right)\) which can be the smallest difference from the reference that is considered to be different. In contrast, a non-inferiority test aims to show that a treatment mean is no more than a reference mean by more than the non-inferiority margin \(\left({M}_{NI}\right)\) which can be the largest change from the baseline (zero) that is considered to be trivial. An equivalence test or two one-sided tests combines superiority and non-inferiority tests. Rejecting the null hypothesis implies that the mean difference differs by less than specific limits where \({M}_{s}\) is the lower limit and \({M}_{NI}\), the upper limit.

  2. 7.5% = (39,267 + 90,880 + 142,894)/(1,083,733 + 1,115,120 + 1,185,105 + 39,267 + 90,880 + 142,894).

References

  • Agresti, A.: Categorical Data Analysis. Wiley, New York (2002)

    Book  Google Scholar 

  • Baker, M.: 1,500 scientists lift the lid on reproducibility. Nature 533(7604), 452–454 (2016). https://doi.org/10.1038/533452a

    Article  Google Scholar 

  • Bartoszyński, R., Niewiadomska-Bugaj, M.: Probability and Statistical Inference. John Wiley & Sons, New Jersey (2008)

    Google Scholar 

  • Berger, J.O., Sellke, T.: Testing a point null hypothesis: the irreconcilability of p values and evidence. J. Am. Stat. Assoc. 82(397), 112–122 (1987)

    Google Scholar 

  • Brandt, M.J., IJzerman, H., Dijksterhuis, A., Farach, F.J., Geller, J., Giner-Sorolla, R., Grange, J.A., Perugini, M., Spies, J.R., van’t Veer, A.: The replication recipe: What makes for a convincing replication? J. Exp. Soc. Psychol. 50, 217–224 (2014). https://doi.org/10.1016/j.jesp.2013.10.005

    Article  Google Scholar 

  • Brewer, M.B., Crano, W.D.: Research design and issues of validity. In: Reis, H.T., Judd, C.M. (eds.) Handbook of Research Methods in Social and Personality psychology, 2nd edn., pp. 11–26. Cambridge University Press, New York (2014)

    Chapter  Google Scholar 

  • Brown, A. N., Cameron, D. B., Wood, B. D. K.: Quality evidence for policymaking. I’ll believe it when I see the replication. Replication Paper 1. International Initiative for Impact Evaluation (3ie), Washington (2014)

  • Campbell, D.T.: Reforms as experiments. Am. Psychol. 24, 409–429 (1969)

    Article  Google Scholar 

  • Cohen, J.: Statistical Power Analysis for the Behavioral Sciences, 2nd edn. L. Erlbaum Associates, Hillsdale (1988)

    Google Scholar 

  • Fidler, F., Wilcox, J.: Reproducibility of Scientific Results. The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University. https://plato.stanford.edu/entries/scientific-reproducibility/ (2018). Accessed 25 February 2021

  • Food and Drug Administration (FDA): Guideline for the Format and Content of the Human Pharmakinetics and Bioavailability Section of an Application. U.S. Food and Drug Administration, Rockville (1987)

    Google Scholar 

  • Food and Drug Administration (FDA): History of Bioequivalence for Critical Dose Drugs. U.S. Food and Drug Administration, Rockville (2010)

    Google Scholar 

  • Goodman, S.N.: A Comment on replication, p values, and evidence. Stat. Med. 11, 875–879 (1992)

    Article  Google Scholar 

  • Goodman, S.N.: p values, hypothesis tests, and likelihood: Implications for epidemiology of a neglected historical debate. Am. J. Epidemiol. 137(5), 485–496 (1993)

    Article  Google Scholar 

  • Greenland, S.: Null misinterpretation in statistical testing and its impact on health risk assessment. Prev. Med. 53, 225–228 (2011). https://doi.org/10.1016/j.ypmed.2011.08.010

    Article  Google Scholar 

  • Greenland, S.: Nonsignificance plus high power does not imply support for the null over the alternative. Ann. Epidemiol. 22(5), 364–368 (2012). https://doi.org/10.1016/j.annepidem.2012.02.007

    Article  Google Scholar 

  • Hacking, I.: Logic of statistical inference. Cambridge University Press. New York (1965)

  • Hamermesh, D.S.: Viewpoint: replication in economics. Can. J. Econ. 40(3), 715–733 (2007)

    Article  Google Scholar 

  • Hauck, W.W., Anderson, S.: A new statistical procedure for testing equivalence in two-group comparative bioavailability trials. J. Pharmacokinet. Biopharm. 12(1), 83–91 (1984)

    Article  Google Scholar 

  • Hopkins, K.D., Hopkins, B.R., Glass, G.V.: Basic Statistics for the Behavioral Sciences, 3rd edn. Simon & Schuster Company, Needham Heights (1996)

    Google Scholar 

  • Klein, R.A., Ratliff, K.A., Vianello, M., Adams, R.B., Bahník, Š, Bernstein, M., Bocian, K., Brandt, M., Brooks, B.S., Brumbaugh, C.C., Cemalcilar, Z.: Investigating variation in replicability. A ‘“many labs”’ replication project. Soc. Psychol. 45(3), 142–152 (2014). https://doi.org/10.1027/1864-9335/a000178

    Article  Google Scholar 

  • Lindsay, R.M., Ehrenberg, A.S.C.: The design of replicated studies. Am. Stat. 47, 217–228 (1993)

    Google Scholar 

  • Lykken, D.T.: Statistical significance in psychological research. Psychol. Bull. 70(3), 151–159 (1968)

    Article  Google Scholar 

  • McShane, B.B., Tackett, J.L., Böckenholt, U., Gelman, A.: Large-scale replication projects in contemporary psychological research. Am. Stat. 73(S1), 99–105 (2019). https://doi.org/10.1080/00031305.2018.1505655

    Article  Google Scholar 

  • NCSS.: Group sequential superiority by a margin tests for two means with known variances. In PASS Sample Size Software (Chapter 763). https://www.ncss.com/wp content/themes/ncss/pdf/Procedures/PASS/GroupSequential_Superiority_by_a_Margin_Tests_for_Two_Means_with_Known_Variances-Simulation.pdf (n.d.a). Accessed 31 Aug 2021

  • NCSS.: Group Sequential non-inferiority tests for two means with known variances. In PASS Sample Size Software (Chapter 762). https://www.ncss.com/wpcontent/themes/ncss/pdf/Procedures/PASS/Group Sequential_Non-Inferiority_Tests_for_Two_Means_with_Known_Variances-Simulation.pdf (n.d.b). Accessed 31 Aug 2021

  • NCSS.: Equivalence tests for two means. In PASS Sample Size Software (Chapter 465). https://www.ncss.com/wp content/themes/ncss/pdf/Procedures/PASS/Equivalence_Tests_for_Two_Means-Simulation.pdf (n.d.c). Accessed 31 Aug 2021

  • Neyman, J.: Fiducial argument and the theory of confidence intervals. Biometrika 32(2), 128–150 (1941)

    Article  Google Scholar 

  • Neyman, J., Pearson, E.: On the problem of the most efficient tests of statistical hypotheses. Philos. Trans. R. Soc. Lond. Ser. A 231, 289–337 (1933)

    Article  Google Scholar 

  • Norton, B.J., Strube, M.J.: Understanding statistical power. J. Orthop. Sports Phys. Ther. 31(6), 307–315 (2001)

    Article  Google Scholar 

  • of repeatability. European Journal of Parapsychology, 3, 423–433.

  • Open Science Collaboration: The Reproducibility project: a model of large-scale collaboration for empirical research on reproducibility. In: Stodden, V., Leisch, F., Peng, R.D. (eds.) Implementing Reproducible Research, pp. 299–323. Taylor & Francis, New York (2014)

    Google Scholar 

  • Open Science Collaboration: Estimating the reproducibility of psychological science. Science 349, 943–951 (2015)

    Article  Google Scholar 

  • Ottenbacher, K.J.: The power of replications and replications of power. Am. Stat. 50(3), 271–275 (1996)

    Google Scholar 

  • Pashler, H., Wagenmakers, E.-J.: Editors’ introduction to the special section on replicability in psychological science: A crisis of confidence? Perspect. Psychol. Sci. 7(6), 528–530 (2012). https://doi.org/10.1177/1745691612465253

    Article  Google Scholar 

  • Peng, R.D.: The reproducibility crisis in science: a statistical counterattack. Significance 12(3), 30–32 (2015). https://doi.org/10.1111/j.1740-9713.2015.00827.x

    Article  Google Scholar 

  • Prinz, F., Schlange, T., Asadullah, K.: Believe it or not: how much can we rely on published data on potential drug targets? Nat. Rev. Drug Discov. 10, 712–713 (2011)

    Article  Google Scholar 

  • Rubin, D.B.: Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 66(5), 688–701 (1974)

    Article  Google Scholar 

  • Sargent, C.L.: The repeatability of significance and the significance of repeatability. Eur. J. Parapsychol. 3, 423–433 (1981)

    Google Scholar 

  • SAS Institute Inc. SAS/STAT® 9.4 user’s guide. SAS Institute Inc., Cary (2013)

  • Schmidt, S.: Shall we really do it again? The powerful concept of replication is neglected in the social sciences. Rev. Gen. Psychol. 13(2), 90–100 (2009)

    Article  Google Scholar 

  • Schooler, J.W.: Metascience could rescue the “replication crisis.” Nature 515(7525), 9 (2014). https://doi.org/10.1038/515009a

    Article  Google Scholar 

  • Schuirmann, D.J.: A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. J. Pharmacokinet. Biopharm. 15(6), 657–680 (1987)

    Article  Google Scholar 

  • Shadish, W.R.: Campbell and Rubin: a primer and comparison of their approaches to causal inference in field settings. Psychol. Methods 15(1), 3–17 (2010). https://doi.org/10.1037/a0015916

    Article  Google Scholar 

  • Shadish, W.R., Cook, T.D., Campbell, D.T.: Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Houghton Mifflin, Boston (2002)

    Google Scholar 

  • Simons, D.J.: The value of direct replication. Perspect. Psychol. Sci. 9(1), 76–80 (2014). https://doi.org/10.1177/1745691613514755

    Article  Google Scholar 

  • Tarran, B.: The S word and what to do about it. Significance 16(4), 14 (2019)

    Article  Google Scholar 

  • Trochim, W.M.K., Donnelly, J.P.: The Research Methods Knowledge Base. Cengage Learning, Boston (2006)

    Google Scholar 

  • Washburn, A.N., Hanson, B.E., Motyl, M., Skitka, L.J., Yantis, C., Wong, K.M., Sun, J., et al.: Why do some psychology researchers resist adopting proposed reforms to research practices? A description of researchers’ rationales. Adv. Methods Pract. Psychol. Sci. 1(2), 166–173 (2018). https://doi.org/10.1177/2515245918757427

    Article  Google Scholar 

  • Wasserstein, R.L., Lazar, N.A.: The ASA’s statement on p values: context, process, and purpose. Am. Stat. 70(2), 129–133 (2016). https://doi.org/10.1080/00031305.2016.1154108

    Article  Google Scholar 

  • Wasserstein, R.L., Schirm, A.L., Lazar, N.A.: Moving to a World beyond “p < 0.05.” Am. Stat. 73(1), 1–19 (2019). https://doi.org/10.1080/00031305.2019.1583913

    Article  Google Scholar 

  • Winer, B.J.: Statistical Principles in Experimental Design, 2nd edn. McGraw-Hill Publishing, New York (1971)

    Google Scholar 

Download references

Funding

Edward Applegate and Chris Coryn declared that no funds, grants, or other support were received during the preparation of this manuscript. Pedro Mateu has received research support from the Office of the Vice Chancellor for Research at Universidad del Pacífico (Lima, Peru).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pedro Mateu.

Ethics declarations

Conflict of interest

The authors declared no conflicts of interest with respect to the authorship and/or publication of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 910 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mateu, P., Applegate, B. & Coryn, C.L. Towards more credible conceptual replications under heteroscedasticity and unbalanced designs. Qual Quant 58, 723–751 (2024). https://doi.org/10.1007/s11135-023-01657-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11135-023-01657-0

Keywords

Navigation