Abstract
Introduction
There has been a notable increase in the advocacy of using small-sample designs as an initial quantitative assessment of item and scale performance during the scale development process. This is particularly true in the development of clinical outcome assessments (COAs), where Rasch analysis has been advanced as an appropriate statistical tool for evaluating the develo** COAs using a small sample.
Methods
We review the benefits such methods are purported to offer from both a practical and statistical standpoint and detail several problematic areas, including both practical and statistical theory concerns, with respect to the use of quantitative methods, including Rasch-consistent methods, with small samples.
Conclusions
The feasibility of obtaining accurate information and the potential negative impacts of misusing large-sample statistical methods with small samples during COA development are discussed.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11136-016-1364-9/MediaObjects/11136_2016_1364_Fig1_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11136-016-1364-9/MediaObjects/11136_2016_1364_Fig2_HTML.gif)
Similar content being viewed by others
Notes
The parameters from this article were selected simply as representative of “real-world” values from a recently published COA analysis. Their use here is one of convenience and should not be taken as a judgement of the analyses conducted or obtained parameter estimates, which were psychometrically sound and found using a sample of over 200 observations.
References
Patrick, D. L., Burke, L. B., Gwaltney, C. J., Leidy, N. K., Martin, M. L., Molsen, E., et al. (2011). Content validity—establishing and reporting the evidence in newly developed patient-reported outcomes (PRO) instruments for medical product evaluation: ISPOR PRO good research practices task force report: Part 2—Assessing respondent understanding. Value in Health, 14, 978–988.
Stansbury, J. P. (2013). Mixed methods to enhance content validity of measures for use in drug-development trials. In A. Slagle (Eds.), Mixed methods—FDA perspective: Incorporating mixed methods to enhance content validity in drug-development tools. Panel conducted at the patient reported outcome (PRO) consortium workshop. Silver springs, MD. Retrieved from http://c-path.org/PROSlides/Workshop3/2012_PROConsortium_PanelSession2.pdf.
Gorecki, C., Lam**, D. L., Nixon, J., Brown, J. M., & Cano, S. (2012). Applying mixed methods to pretest the Pressure Ulcer Quality of Life (PU_QOL) instrument. Quality of Life Research, 21, 441–451.
Cappelleri, J. C. (2012). Classical test theory and item response theory: A brief overview. In J. Lundy (Eds.), Mixed methods approach to assuring content validity. Panel conducted at the patient reported outcome (PRO) consortium workshop, Silver springs, MD. Retrieved from http://c-path.org/wp-content/uploads/2013/09/2012_PROConsortium_PanelSession2.pdf.
Lord, F. M. (1983). Small N justifies Rasch model. In D. J. Weiss (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing. New York: Academic Press.
Linacre, J. M. (1994). Sample size and item calibration stability. Rasch Measurement Transactions, 7, 328. Retrieved from http://www.rasch.org/rmt/rmt74m.htm.
Cappelleri, J. C. (2013). Mixed method approach to evaluating content validity: review and update In A. Slagle (Eds.), Mixed methods—industry and academic experience. panel conducted at the patient reported outcome (PRO) consortium workshop, Silver springs, MD. Retrieved from http://c-path.org/wp-content/uploads/2013/09/PRO_Consortium_PanelDiscussion1.pdf.
Petrillo, J., Cano, S. J., Mcleod, L. D., & Coon, C. D. (2015). Using classical test theory, item response theory, and Rasch measurement theory to evaluate patient-reported outcome measures: A comparison of worked examples. Value in Health, 18, 25–34.
Lee, O. K. (1992). Variance in mathematics and reading across grades: Grade equivalents and logits. Rasch Measurement Transactions, 6, 222–223. Retrieved from http://www.rasch.org/rmt/rmt62f.htm.
Linacre, J. M. (1999). Investigating rating scale category utility. Journal of Outcome Measurement, 3, 103–122.
Takane, Y., & de Leeuw, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52, 393–408.
Costello, A., B., & Osborne, J. W. (2005). Best practices in exploratory factor analysis: Recommendations for getting the most from your analysis. Practical Assessment, Research, and Evaluation, 10,7. Retrieved from http://pareonline.net/getvn.asp?v=10&n=7.
Anthoine, E., Moret, L., Regnault, A., Sbille, V., & Hardouin, J.-B. (2014). Sample size used to validate a scale: A review of publications on newly-developed patient reported outcome measures. Health and Quality of Life Outcomes, 12(1), 30–46.
MacCallum, R. C., Widaman, K. F., Zhang, S., & Hong, S. (1999). Sample size in factor analysis. Psychological Methods, 4, 84–99.
Choi, S., Cook, K., & Dodd, B. (1997). Parameter recovery for the partial credit model using MULTILOG. Journal of Outcome Measurement, 1, 114–142.
DeMars, C. E. (2002). Recovery of graded response and partial credit parameters in MULTILOG and PARSCALE. Paper presented at the Annual Meeting of the American Educational Research Association, Chicago, IL.
French, G., & Dodd, B. (1999). Parameter recovery for the rating scale model using PARSCALE. Journal of Outcome Measurement, 3, 176–199.
Goldman, S. H., & Raju, N. S. (1986). Recovery of one-and two-parameter logistic item parameters: An empirical study. Educational and Psychological Measurement, 46, 11–21.
Guyer, R., & Thompson, N. (2011). Item response theory parameter recovery using Xcalibre 4.1 (Technical Report). St. Paul, MN: Assessment Systems Corporation. Retrieved from http://www.assess.com/docs/Xcalibre_4.1_tech_report.pdf.
He, Q., & Wheadon, C. (2008). The effect of sample size on item parameter estimation for the partial credit model. Centre for Education and Research Policy. Retrieved from https://cerp.aqa.org.uk/sites/default/files/pdf_upload/CERP_RP_QH_11122008.pdf.
Le, L. T., & Adams, R. J. (2013). Accuracy of Rasch model item parameter estimation. Retrieved from Australian Council for Educational Research. http://research.acer.edu.au/cgi/viewcontent.cgi?article=1013&context=ar_misc.
Meyer, J. P., & Hailey, E. (2012). A study of Rasch, partial credit, and rating scale model parameter recovery in WINSTEPS and jMetrik. Journal of Applied Measurement, 13, 248–258.
Preinerstorfer, D., & Formann, A. K. (2012). Parameter recovery and model selection in mixed Rasch models. British Journal of Mathematical and Statistical Psychology, 65, 251–262.
Wang, W.-C., & Chen, C.-T. (2005). Item parameters recovery, standard error estimates, and fit statistics of the WINSTEPS program for the family of Rasch models. Educational and Psychological Measurement, 65, 376–404.
Green, K. E., & Frantom, C. G. (2002). Survey development and validation with the Rasch model. Paper presented at the International Conference on Questionnaire Development, Evaluation, and Testing, Charleston, SC.
Wright, B. D. (1977). Misunderstanding the Rasch model. Journal of Educational Measurement, 14, 219–225.
Stone, M., & Yumoto, F. (2004). The effect of sample size for estimation Rasch/IRT parameters with dichotomous items. Journal of Applied Measurement, 5, 48–61.
Chen, W.-H., Lenderking, W., **, Y., Wyrwich, K. W., Gelhorn, H., & Revicki, D. A. (2014). Is Rasch model analysis applicable in small sample pilot studies for assessing preliminary item characteristics? An example using PROMIS pain behavior item bank data. Quality of Life Research, 23, 485–493.
Smith, R. M. (1996). Polytomous mean-square fit statistics. Rasch Measurement Transactions, 10, 516–517. Retrieved from http://www.rasch.org/rmt/rmt103a.htm.
Karabatsos, G. (2000). A critique of Rasch residual fit statistics. Journal of Applied Measurement, 1, 152–176.
Wright, B. D., Linacre, J. M., Gustafson, J.-E., & Martin-Löf, P. (1994). Reasonable mean-square fit values. Rasch Measurement Transactions, 8, 370. Retrieved from http://www.rasch.org/rmt/rmt83b.htm.
Smith, R. M., Schumacker, R. E., & Bush, M. J. (1998). Using item mean squares to evaluate fit to the Rasch model. Journal of Outcome Measurement, 2, 66–78.
Smith, R. M. (1996). A comparison of the Rasch separate calibration and between-fit methods of detecting item bias. Educational and Psychological Measurement, 56, 403–418.
Linacre, J. M. (2000). Item discrimination and infit mean-squares. Rasch Measurement Transactions, 14, 743. Retrieved from http://www.rasch.org/rmt/rmt142a.htm.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Carrie R. Houts, Michael C. Edwards, R. J. Wirth, and Linda S. Deal declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Rights and permissions
About this article
Cite this article
Houts, C.R., Edwards, M.C., Wirth, R.J. et al. A review of empirical research related to the use of small quantitative samples in clinical outcome scale development. Qual Life Res 25, 2685–2691 (2016). https://doi.org/10.1007/s11136-016-1364-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11136-016-1364-9