Towards more credible conceptual replications under heteroscedasticity and unbalanced designs

Mateu, Pedro; Applegate, Brooks; Coryn, Chris L.

doi:10.1007/s11135-023-01657-0

Towards more credible conceptual replications under heteroscedasticity and unbalanced designs

Published: 18 April 2023

Volume 58, pages 723–751, (2024)
Cite this article

Quality & Quantity Aims and scope Submit manuscript

88 Accesses
Explore all metrics

Abstract

Theory cannot be fully validated unless original results have been replicated, resulting in conclusion consistency. Replications are the strongest source of evidence to verify research findings and knowledge claims. In the social sciences, replication studies often fail and thus a continuing need for replication studies to confirm tentative facts, expand knowledge to gain new understanding, and verify hypotheses. Failure to replicate in the social and behavioral sciences sometimes arises due to dissimilarity between hypotheses formulated in original and replication studies. Alternatively, failure to replicate also occurs when the same hypothesis is tested; but done so in the absence of knowledge from previous investigations, as when original study effect sizes are not considered in replication studies. To increase replicability of research findings, this paper demonstrates that the application of two one-sided tests to evaluate a replication question provides a superior means for conducting replications, assuming all other methodological procedures remained as similar as possible. Furthermore, this paper sought to explore the impact of heteroscedasticity and unbalanced designs in replication studies in four paired conditions of variance and sample size. Two Monte Carlo simulations, each with two stages, were conducted to investigate conclusion consistency among different replication procedures to determine the repeatability of an observed effect. Overall, the proposed approach yielded a higher proportion of successful replications than the conventional approach (testing the original null hypothesis of no effect). Thus, findings can be confirmed by replications and in the absence of confirmation, there cannot be a final statement about any theory.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

What type of Type I error? Contrasting the Neyman–Pearson and Fisherian approaches in the context of exact and direct replications

Article 16 October 2019

Assessing consistency of effects when applying multilevel models to single-case data

Article 21 May 2020

Design-Based Approaches to Causal Replication Studies

Article 01 July 2021

Notes

Both superiority and non-inferiority tests are special cases of one-sided tests. Under a superiority test, rejecting the null hypothesis implies that a treatment mean is greater than a reference mean \(\left(\delta ={\mu }_{A}-{\mu }_{B}\right)\), by more than the superiority margin \(\left({M}_{s}\right)\) which can be the smallest difference from the reference that is considered to be different. In contrast, a non-inferiority test aims to show that a treatment mean is no more than a reference mean by more than the non-inferiority margin \(\left({M}_{NI}\right)\) which can be the largest change from the baseline (zero) that is considered to be trivial. An equivalence test or two one-sided tests combines superiority and non-inferiority tests. Rejecting the null hypothesis implies that the mean difference differs by less than specific limits where \({M}_{s}\) is the lower limit and \({M}_{NI}\), the upper limit.
7.5% = (39,267 + 90,880 + 142,894)/(1,083,733 + 1,115,120 + 1,185,105 + 39,267 + 90,880 + 142,894).

References

Agresti, A.: Categorical Data Analysis. Wiley, New York (2002)
Book Google Scholar
Baker, M.: 1,500 scientists lift the lid on reproducibility. Nature 533(7604), 452–454 (2016). https://doi.org/10.1038/533452a
Article Google Scholar
Bartoszyński, R., Niewiadomska-Bugaj, M.: Probability and Statistical Inference. John Wiley & Sons, New Jersey (2008)
Google Scholar
Berger, J.O., Sellke, T.: Testing a point null hypothesis: the irreconcilability of p values and evidence. J. Am. Stat. Assoc. 82(397), 112–122 (1987)
Google Scholar
Brandt, M.J., IJzerman, H., Dijksterhuis, A., Farach, F.J., Geller, J., Giner-Sorolla, R., Grange, J.A., Perugini, M., Spies, J.R., van’t Veer, A.: The replication recipe: What makes for a convincing replication? J. Exp. Soc. Psychol. 50, 217–224 (2014). https://doi.org/10.1016/j.jesp.2013.10.005
Article Google Scholar
Brewer, M.B., Crano, W.D.: Research design and issues of validity. In: Reis, H.T., Judd, C.M. (eds.) Handbook of Research Methods in Social and Personality psychology, 2nd edn., pp. 11–26. Cambridge University Press, New York (2014)
Chapter Google Scholar
Brown, A. N., Cameron, D. B., Wood, B. D. K.: Quality evidence for policymaking. I’ll believe it when I see the replication. Replication Paper 1. International Initiative for Impact Evaluation (3ie), Washington (2014)
Campbell, D.T.: Reforms as experiments. Am. Psychol. 24, 409–429 (1969)
Article Google Scholar
Cohen, J.: Statistical Power Analysis for the Behavioral Sciences, 2nd edn. L. Erlbaum Associates, Hillsdale (1988)
Google Scholar
Fidler, F., Wilcox, J.: Reproducibility of Scientific Results. The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University. https://plato.stanford.edu/entries/scientific-reproducibility/ (2018). Accessed 25 February 2021
Food and Drug Administration (FDA): Guideline for the Format and Content of the Human Pharmakinetics and Bioavailability Section of an Application. U.S. Food and Drug Administration, Rockville (1987)
Google Scholar
Food and Drug Administration (FDA): History of Bioequivalence for Critical Dose Drugs. U.S. Food and Drug Administration, Rockville (2010)
Google Scholar
Goodman, S.N.: A Comment on replication, p values, and evidence. Stat. Med. 11, 875–879 (1992)
Article Google Scholar
Goodman, S.N.: p values, hypothesis tests, and likelihood: Implications for epidemiology of a neglected historical debate. Am. J. Epidemiol. 137(5), 485–496 (1993)
Article Google Scholar
Greenland, S.: Null misinterpretation in statistical testing and its impact on health risk assessment. Prev. Med. 53, 225–228 (2011). https://doi.org/10.1016/j.ypmed.2011.08.010
Article Google Scholar
Greenland, S.: Nonsignificance plus high power does not imply support for the null over the alternative. Ann. Epidemiol. 22(5), 364–368 (2012). https://doi.org/10.1016/j.annepidem.2012.02.007
Article Google Scholar
Hacking, I.: Logic of statistical inference. Cambridge University Press. New York (1965)
Hamermesh, D.S.: Viewpoint: replication in economics. Can. J. Econ. 40(3), 715–733 (2007)
Article Google Scholar
Hauck, W.W., Anderson, S.: A new statistical procedure for testing equivalence in two-group comparative bioavailability trials. J. Pharmacokinet. Biopharm. 12(1), 83–91 (1984)
Article Google Scholar
Hopkins, K.D., Hopkins, B.R., Glass, G.V.: Basic Statistics for the Behavioral Sciences, 3rd edn. Simon & Schuster Company, Needham Heights (1996)
Google Scholar
Klein, R.A., Ratliff, K.A., Vianello, M., Adams, R.B., Bahník, Š, Bernstein, M., Bocian, K., Brandt, M., Brooks, B.S., Brumbaugh, C.C., Cemalcilar, Z.: Investigating variation in replicability. A ‘“many labs”’ replication project. Soc. Psychol. 45(3), 142–152 (2014). https://doi.org/10.1027/1864-9335/a000178
Article Google Scholar
Lindsay, R.M., Ehrenberg, A.S.C.: The design of replicated studies. Am. Stat. 47, 217–228 (1993)
Google Scholar
Lykken, D.T.: Statistical significance in psychological research. Psychol. Bull. 70(3), 151–159 (1968)
Article Google Scholar
McShane, B.B., Tackett, J.L., Böckenholt, U., Gelman, A.: Large-scale replication projects in contemporary psychological research. Am. Stat. 73(S1), 99–105 (2019). https://doi.org/10.1080/00031305.2018.1505655
Article Google Scholar
NCSS.: Group sequential superiority by a margin tests for two means with known variances. In PASS Sample Size Software (Chapter 763). https://www.ncss.com/wp content/themes/ncss/pdf/Procedures/PASS/GroupSequential_Superiority_by_a_Margin_Tests_for_Two_Means_with_Known_Variances-Simulation.pdf (n.d.a). Accessed 31 Aug 2021
NCSS.: Group Sequential non-inferiority tests for two means with known variances. In PASS Sample Size Software (Chapter 762). https://www.ncss.com/wpcontent/themes/ncss/pdf/Procedures/PASS/Group Sequential_Non-Inferiority_Tests_for_Two_Means_with_Known_Variances-Simulation.pdf (n.d.b). Accessed 31 Aug 2021
NCSS.: Equivalence tests for two means. In PASS Sample Size Software (Chapter 465). https://www.ncss.com/wp content/themes/ncss/pdf/Procedures/PASS/Equivalence_Tests_for_Two_Means-Simulation.pdf (n.d.c). Accessed 31 Aug 2021
Neyman, J.: Fiducial argument and the theory of confidence intervals. Biometrika 32(2), 128–150 (1941)
Article Google Scholar
Neyman, J., Pearson, E.: On the problem of the most efficient tests of statistical hypotheses. Philos. Trans. R. Soc. Lond. Ser. A 231, 289–337 (1933)
Article Google Scholar
Norton, B.J., Strube, M.J.: Understanding statistical power. J. Orthop. Sports Phys. Ther. 31(6), 307–315 (2001)
Article Google Scholar
of repeatability. European Journal of Parapsychology, 3, 423–433.
Open Science Collaboration: The Reproducibility project: a model of large-scale collaboration for empirical research on reproducibility. In: Stodden, V., Leisch, F., Peng, R.D. (eds.) Implementing Reproducible Research, pp. 299–323. Taylor & Francis, New York (2014)
Google Scholar
Open Science Collaboration: Estimating the reproducibility of psychological science. Science 349, 943–951 (2015)
Article Google Scholar
Ottenbacher, K.J.: The power of replications and replications of power. Am. Stat. 50(3), 271–275 (1996)
Google Scholar
Pashler, H., Wagenmakers, E.-J.: Editors’ introduction to the special section on replicability in psychological science: A crisis of confidence? Perspect. Psychol. Sci. 7(6), 528–530 (2012). https://doi.org/10.1177/1745691612465253
Article Google Scholar
Peng, R.D.: The reproducibility crisis in science: a statistical counterattack. Significance 12(3), 30–32 (2015). https://doi.org/10.1111/j.1740-9713.2015.00827.x
Article Google Scholar
Prinz, F., Schlange, T., Asadullah, K.: Believe it or not: how much can we rely on published data on potential drug targets? Nat. Rev. Drug Discov. 10, 712–713 (2011)
Article Google Scholar
Rubin, D.B.: Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 66(5), 688–701 (1974)
Article Google Scholar
Sargent, C.L.: The repeatability of significance and the significance of repeatability. Eur. J. Parapsychol. 3, 423–433 (1981)
Google Scholar
SAS Institute Inc. SAS/STAT® 9.4 user’s guide. SAS Institute Inc., Cary (2013)
Schmidt, S.: Shall we really do it again? The powerful concept of replication is neglected in the social sciences. Rev. Gen. Psychol. 13(2), 90–100 (2009)
Article Google Scholar
Schooler, J.W.: Metascience could rescue the “replication crisis.” Nature 515(7525), 9 (2014). https://doi.org/10.1038/515009a
Article Google Scholar
Schuirmann, D.J.: A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. J. Pharmacokinet. Biopharm. 15(6), 657–680 (1987)
Article Google Scholar
Shadish, W.R.: Campbell and Rubin: a primer and comparison of their approaches to causal inference in field settings. Psychol. Methods 15(1), 3–17 (2010). https://doi.org/10.1037/a0015916
Article Google Scholar
Shadish, W.R., Cook, T.D., Campbell, D.T.: Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Houghton Mifflin, Boston (2002)
Google Scholar
Simons, D.J.: The value of direct replication. Perspect. Psychol. Sci. 9(1), 76–80 (2014). https://doi.org/10.1177/1745691613514755
Article Google Scholar
Tarran, B.: The S word and what to do about it. Significance 16(4), 14 (2019)
Article Google Scholar
Trochim, W.M.K., Donnelly, J.P.: The Research Methods Knowledge Base. Cengage Learning, Boston (2006)
Google Scholar
Washburn, A.N., Hanson, B.E., Motyl, M., Skitka, L.J., Yantis, C., Wong, K.M., Sun, J., et al.: Why do some psychology researchers resist adopting proposed reforms to research practices? A description of researchers’ rationales. Adv. Methods Pract. Psychol. Sci. 1(2), 166–173 (2018). https://doi.org/10.1177/2515245918757427
Article Google Scholar
Wasserstein, R.L., Lazar, N.A.: The ASA’s statement on p values: context, process, and purpose. Am. Stat. 70(2), 129–133 (2016). https://doi.org/10.1080/00031305.2016.1154108
Article Google Scholar
Wasserstein, R.L., Schirm, A.L., Lazar, N.A.: Moving to a World beyond “p < 0.05.” Am. Stat. 73(1), 1–19 (2019). https://doi.org/10.1080/00031305.2019.1583913
Article Google Scholar
Winer, B.J.: Statistical Principles in Experimental Design, 2nd edn. McGraw-Hill Publishing, New York (1971)
Google Scholar

Download references

Funding

Edward Applegate and Chris Coryn declared that no funds, grants, or other support were received during the preparation of this manuscript. Pedro Mateu has received research support from the Office of the Vice Chancellor for Research at Universidad del Pacífico (Lima, Peru).

Author information

Authors and Affiliations

Department of Economics, Universidad del Pacífico, Jr. Sánchez Cerro 2050, Jesús María, Lima 11, Peru
Pedro Mateu
Department of Educational Leadership, Research, and Technology, Western Michigan University, Kalamazoo, MI, USA
Brooks Applegate
Graduate College, Western Michigan University, Kalamazoo, MI, USA
Chris L. Coryn

Authors

Pedro Mateu
View author publications
You can also search for this author in PubMed Google Scholar
Brooks Applegate
View author publications
You can also search for this author in PubMed Google Scholar
Chris L. Coryn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pedro Mateu.

Ethics declarations

Conflict of interest

The authors declared no conflicts of interest with respect to the authorship and/or publication of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 910 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Mateu, P., Applegate, B. & Coryn, C.L. Towards more credible conceptual replications under heteroscedasticity and unbalanced designs. Qual Quant 58, 723–751 (2024). https://doi.org/10.1007/s11135-023-01657-0

Download citation

Accepted: 23 March 2023
Published: 18 April 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s11135-023-01657-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards more credible conceptual replications under heteroscedasticity and unbalanced designs

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

What type of Type I error? Contrasting the Neyman–Pearson and Fisherian approaches in the context of exact and direct replications

Assessing consistency of effects when applying multilevel models to single-case data

Design-Based Approaches to Causal Replication Studies

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (PDF 910 KB)

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Towards more credible conceptual replications under heteroscedasticity and unbalanced designs

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

What type of Type I error? Contrasting the Neyman–Pearson and Fisherian approaches in the context of exact and direct replications

Assessing consistency of effects when applying multilevel models to single-case data

Design-Based Approaches to Causal Replication Studies

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (PDF 910 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation