Introduction

Science is powerful, in part, because it is self-correcting. Specifically, due to replication and the cumulative nature of scientific inquiry, over time, errors are exposed. For errors to be exposed, however, research must be transparent, i.e., research methods (e.g., data cleaning processes, questionnaires, research protocol, etc.) must be made explicit. Without transparency, the replication and comprehensive evaluation of prior research is difficult, and scientific progress may be inhibited (Freese, 2007).

Because accuracy is integral to science, it is perhaps unsurprising that researchers in social and natural sciences are vexed by errors which may result in an inability to verify or replicate research findings. For example, psychologists’ inability to replicate important findings has garnered considerable attention in and outside of academia (Moody et al., 2022). In response, some have claimed that science is in the midst of a replication crisis (Jamieson, 2018). The discussion of this “crisis” generally blames researchers whose findings are non-replicable, portraying them as sloppy, incompetent, or dishonest (Jamieson, 2018). Ironically, due to fear of public shaming, researchers may hesitate to admit to mistakes, thereby making the identification of non-replicability more difficult (Moody et al., 2022).

Like these other social and natural sciences, there is reason to believe sociology may also suffer from non-replicability. Specifically, Gerber and Malhotra (2008) reviewed papers published in three top sociology journals: American Sociological Review, American Journal of Sociology, and The Sociological Quarterly. The authors compared the distribution of z-scores in these published papers to what might be expected by chance alone. Without bias, we would expect that z-scores would be about equivalent among statistically significant levels. That is, there would be no greater likelihood to publish a paper with a z-score of 1.96 (and corresponding p-value of < 0.05) than a paper with a z-score of 2.58 (and a corresponding p-value of < 0.01). In fact, if the effect and statistical power are large enough, we might even expect a higher rate of papers to be published at the p < 0.01 than the p < 0.05 level (Simonsohn et al., 2014). In contrast, however, z-scores that are barely past the widely accepted critical value (1.96) and associated alpha level (0.05) are published at a (much) higher rate than would be expected by chance alone. The abundance of barely significant results suggests that, for many claims in sociological research, the strength of evidence may be overstated.

To some extent, a lack of reproducibility is normal and to be expected in research (Shiffrin et al., 2018). It is important, however, that non-reproducible findings are identified so that they do not become the foundation for future research. Additionally, retractions, corrections, and public debate over research findings may damage sociologists’ credibility among the general public (Anvari and Lakens 2018; Hendriks et al., 2020; Wingen et al., 2020). This reputational damage is unfortunate because sociologists have the theoretical and methodological training to weigh in on conversations that are relevant for public policy, workplace practices, and educational guidelines, among others. Furthermore, sociologists’ conclusions are often unpopular, especially to those who have historically held power. For sociological research to have the maximum possible impact, researchers must maximize the actual and perceived quality of their research.

Below, I begin by describing two related practices, preregistration and registered reports, which have been shown to: (1) enhance research quality and replication (Chambers and Tzavella 2022; Scheel 2021; Soderberg et al., 2021; Wicherts et al. 2011) and (2) increase the public’s trust in research (Chambers and Tzavella 2022; Christensen and Miguel 2016; Nosek and Lakens 2014; Parker et al., 2019; Scheel 2021). Next, I discuss the strengths and weaknesses of these practices. Finally, I conclude by discussing the potential implications of adopting preregistration and registered reports for the field more broadly.

Background

Pre-registration

Preregistration is the process of carefully considering and stating research plans and rationale in a repository. When and if the researcher is ready to share them, these plans can be made public. Depending on the type of research, preregistration may include details such as hypotheses, sampling strategy, interview guides, exclusion criteria, study design, and/or analysis plans (Kavanagh & Kapitány, 2019). Because inductive/abductive and descriptive research designs may change in response to findings, researchers are encouraged to update their study design throughout the research process. In updating the study design, the preregistration becomes a living document for researchers to track their study’s evolution.

Deductive Research

Deductive research involves the testing of hypotheses. Like inductive/abductive and descriptive research, deductive research may take many forms, including experiments, textual analysis, analysis of interview data, secondary analysis of survey data, etc. To reduce multiple tests (see discussion below), when preregistering deductive research, researchers will post information about research plans. Specifically, researchers will be asked to report their: research question, hypotheses, sampling strategy, independent variables, covariates, dependent variables, planned analyses, exclusion criteria, sample size, operationalization of variables, etc. Changes to these plans may still be made and documented, but researchers engaged in deductive research are encouraged to stay close to original research plans, justify deviations from those plans, and address statistical concerns associated with such deviations.

Strengths of Preregistration

Scholars from other social science disciplines, including psychology, economics, and political science, have discussed the benefits of preregistration (e.g., see DeHaven 2017; Haven & Van Grootel, 2019; Kavanagh & Kapitány, 2020; Haven & Van Grootel, 2019; Timmermans and Tavory 2012). Far from a liability, this prior theoretical knowledge is often considered to be beneficial for theory construction by hel** researchers identify gaps in existing theories (Burawoy 1991; James, 1907; Katz, 2015; Timmermans and Tavory 2012). Preregistration encourages researchers to consider prior theoretical knowledge and maintain a contemporaneous record of changes in theoretical dispositions throughout the development of the research project (Haven et al., 2020; Haven & Van Grootel, 2019).

Multiple Comparisons: Over-reliance on p-values

Above, I discussed how multiple comparisons increase the risk of Type 1 error. Although preregistration and registered reports can verify a priori hypotheses and prevent motivated reasoning for multiple comparisons, they cannot change the fact that multiple comparisons may exist (Rubin, 2017). Specifically, each potential decision about how to treat outliers, missing data, etc. can affect the number of potential comparisons. Even if the results of these decisions do not affect which decisions are made, the mere existence of multiple tests belies the logic of the Fisher/Neyman-Pearson/NHST approach. Said otherwise, although preregistration and registered reports may prevent analysis decisions from being influenced by the implications for results, preregistration and registered reports do not prevent analysis decisions “from being influenced by the idiosyncrasies of the data” (Rubin, 2017, p. 8).

In addition to preregistration and registered reports, there are a few ways to address the issue of multiple comparison, including: (1) considering a higher threshold for statistical significance, (2) abandoning the Neyman-Pearson approach (p-values) for Bayesian methods, and (3) the liberal use of sensitivity analyses (Moody et al., 2022; Rubin, 2017). Doing any one of these methods does not necessarily address all concerns about multiple comparisons. Additionally, of the above-listed solutions to the problem of multiple comparisons, preregistration alone is perhaps the least effective in reducing Type 1 error and ensuring the strength of published research (Moody et al., 2022; Rubin, 2017). Rather, research suggests that the liberal use of sensitivity analyses may be the best approach (Frank et al., 2013; Moody et al., 2022; Rubin, 2017; Young, 2018). That is, rather than rely on a single test, researchers can conduct multiple iterations of that test, reporting them all, and coming to a consensus based on this larger body of analyses. Thus, while preregistration can ensure p-values are used as intended, it cannot fix the errors inherent with p-values.

Errors: Coding & Transcription

It is difficult to avoid error. Although some errors occur due to important researcher oversights, many others occur due to everyday human fallibilities, such as coding and transcription mistakes. Indeed, Nuijten et al., (2016) examined papers published in eight major psychology journals between 1983 and 2015 and found that nearly half contained at least one p-value that was inconsistent with its test statistic and degrees of freedom. The researchers attributed many of these to transcription errors, which aligns with other research (Eubank, 2016; Ferguson and Heene 2012; Gerber and Malhotra 2008; Wetzels et al., 2011). Although this kind of error is pervasive and important, unfortunately it cannot be resolved by preregistration. No matter how thoughtful a preregistration, it cannot prevent typos. To reduce the number of such typos, researchers should embrace tools for automation (Long 2009).

Additionally, rather than exposing potential errors in research, preregistration could be used as armor against criticism. As it stands, when errors are identified in research papers (and/or non-replication occurs), the authors of papers face serious consequences and a damaged reputation (Shamoo & Resnick, 2015). Often, this treatment of errors does not distinguish between honest error and misconduct (Resnik and Stewart 2012). As a result, individuals who make honest mistakes may be hesitant to accept their mistakes for fear of the criticism that may befall them (Moody et al., 2022). To defend against such criticism, researchers may use preregistration as a signal of quality, rather than soberly addressing honest mistakes.

This hypothetical scenario is less of a criticism of preregistration than it is a criticism of the treatment of honest mistakes and the researchers who make them. When encountering scientific errors, sociologists must eschew shaming tactics. Instead, we should establish norms of scientific humility, understanding that we are all prone to making similar errors, and treating others’ research (and mistakes) as we would want them to treat ours (Janz and Freese 2021; Moody et al., 2022).

Because error-prevention is equally desirable for journals and authors, journals may consider embracing some of the responsibility of high quality research by, for example, providing pre-publication code review (Colaresi, 2016; Maner, 2014; Moody et al., 2022). This would involve individuals who are employed by the journal examining code in detail. The proposed benefit of this method is that it would reduce error before publication, thereby aligning journal and researcher interests. Although it would require considerable investment, Moody (2022:78–79) notes that “as a discipline, we have decided that other publication processes—copyediting, layout, and bibliometrics, for example—are acceptable and worthwhile. It may be time to include data and methods editing in this process.” To effectively implement these procedures, however, journals would need to have support and incentive (as opposed to mandates and punishment) from other entities, e.g., governmental agencies, funding organizations, academic associations, etc.

Conclusion

In summary, preregistration has promise, but, like any solution, is unable to fully resolve some of the most serious problems that hinder scientific progress. Preregistration may help improve the quality of research by increasing transparency, reducing some forms of error (through planning), preventing questionable research practices, and reducing the file-drawer problem (Chambers and Tzavella 2022; Nuijten et al., 2016; Scheel, 2021; Soderberg et al., 2021). Furthermore, preregistered papers and registered reports have higher rates of replicability and quality than traditional research (Chambers and Tzavella 2022; Nuijten et al., 2016; Scheel, 2021; Soderberg et al., 2021).

Due to its benefits, preregistration may provide an air of legitimacy. Specifically, the use of preregistration has been found to increase the public’s perception of research credibility (Chambers and Tzavella 2022; Christensen and Miguel 2016; Nosek and Lakens 2014; Parker et al., 2019; Scheel 2021).Footnote 5 Unfortunately, reviewers and journals must be careful when assuming that a preregistered paper is inherently more rigorous or accurate than a non-preregistered paper. Preregistration does not reduce Type 1 error caused by data-related idiosyncrasies, does not reduce errors from coding or transcription, and cannot prevent all questionable research practices (Ikeda et al., Footnote 6 If preregistration is adopted instead of other tools, then researchers, reviewers, and editors may feel better about scientific integrity without substantially improving it. If, however, preregistration is adopted along with other tools or strategies for improving research quality (e.g., pre-publication code review, posting of all materials, more in-depth methods sections), we are likely to see the greatest improvement in research quality (Moody et al., 2022). Thus, preregistration’s true promise lies in the adoption of its goals (i.e., improved transparency, reduced error, comprehensive evaluation of the strength of findings, and a shift in incentive structure), as opposed to an uncritical adoption of its methods.