Log in

A review of empirical research related to the use of small quantitative samples in clinical outcome scale development

  • Review
  • Published:
Quality of Life Research Aims and scope Submit manuscript

Abstract

Introduction

There has been a notable increase in the advocacy of using small-sample designs as an initial quantitative assessment of item and scale performance during the scale development process. This is particularly true in the development of clinical outcome assessments (COAs), where Rasch analysis has been advanced as an appropriate statistical tool for evaluating the develo** COAs using a small sample.

Methods

We review the benefits such methods are purported to offer from both a practical and statistical standpoint and detail several problematic areas, including both practical and statistical theory concerns, with respect to the use of quantitative methods, including Rasch-consistent methods, with small samples.

Conclusions

The feasibility of obtaining accurate information and the potential negative impacts of misusing large-sample statistical methods with small samples during COA development are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. The parameters from this article were selected simply as representative of “real-world” values from a recently published COA analysis. Their use here is one of convenience and should not be taken as a judgement of the analyses conducted or obtained parameter estimates, which were psychometrically sound and found using a sample of over 200 observations.

References

  1. Patrick, D. L., Burke, L. B., Gwaltney, C. J., Leidy, N. K., Martin, M. L., Molsen, E., et al. (2011). Content validity—establishing and reporting the evidence in newly developed patient-reported outcomes (PRO) instruments for medical product evaluation: ISPOR PRO good research practices task force report: Part 2—Assessing respondent understanding. Value in Health, 14, 978–988.

    Article  PubMed  Google Scholar 

  2. Stansbury, J. P. (2013). Mixed methods to enhance content validity of measures for use in drug-development trials. In A. Slagle (Eds.), Mixed methods—FDA perspective: Incorporating mixed methods to enhance content validity in drug-development tools. Panel conducted at the patient reported outcome (PRO) consortium workshop. Silver springs, MD. Retrieved from http://c-path.org/PROSlides/Workshop3/2012_PROConsortium_PanelSession2.pdf.

  3. Gorecki, C., Lam**, D. L., Nixon, J., Brown, J. M., & Cano, S. (2012). Applying mixed methods to pretest the Pressure Ulcer Quality of Life (PU_QOL) instrument. Quality of Life Research, 21, 441–451.

    Article  CAS  PubMed  Google Scholar 

  4. Cappelleri, J. C. (2012). Classical test theory and item response theory: A brief overview. In J. Lundy (Eds.), Mixed methods approach to assuring content validity. Panel conducted at the patient reported outcome (PRO) consortium workshop, Silver springs, MD. Retrieved from http://c-path.org/wp-content/uploads/2013/09/2012_PROConsortium_PanelSession2.pdf.

  5. Lord, F. M. (1983). Small N justifies Rasch model. In D. J. Weiss (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing. New York: Academic Press.

    Google Scholar 

  6. Linacre, J. M. (1994). Sample size and item calibration stability. Rasch Measurement Transactions, 7, 328. Retrieved from http://www.rasch.org/rmt/rmt74m.htm.

  7. Cappelleri, J. C. (2013). Mixed method approach to evaluating content validity: review and update In A. Slagle (Eds.), Mixed methods—industry and academic experience. panel conducted at the patient reported outcome (PRO) consortium workshop, Silver springs, MD. Retrieved from http://c-path.org/wp-content/uploads/2013/09/PRO_Consortium_PanelDiscussion1.pdf.

  8. Petrillo, J., Cano, S. J., Mcleod, L. D., & Coon, C. D. (2015). Using classical test theory, item response theory, and Rasch measurement theory to evaluate patient-reported outcome measures: A comparison of worked examples. Value in Health, 18, 25–34.

    Article  PubMed  Google Scholar 

  9. Lee, O. K. (1992). Variance in mathematics and reading across grades: Grade equivalents and logits. Rasch Measurement Transactions, 6, 222–223. Retrieved from http://www.rasch.org/rmt/rmt62f.htm.

  10. Linacre, J. M. (1999). Investigating rating scale category utility. Journal of Outcome Measurement, 3, 103–122.

    CAS  PubMed  Google Scholar 

  11. Takane, Y., & de Leeuw, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52, 393–408.

    Article  Google Scholar 

  12. Costello, A., B., & Osborne, J. W. (2005). Best practices in exploratory factor analysis: Recommendations for getting the most from your analysis. Practical Assessment, Research, and Evaluation, 10,7. Retrieved from http://pareonline.net/getvn.asp?v=10&n=7.

  13. Anthoine, E., Moret, L., Regnault, A., Sbille, V., & Hardouin, J.-B. (2014). Sample size used to validate a scale: A review of publications on newly-developed patient reported outcome measures. Health and Quality of Life Outcomes, 12(1), 30–46.

    Article  Google Scholar 

  14. MacCallum, R. C., Widaman, K. F., Zhang, S., & Hong, S. (1999). Sample size in factor analysis. Psychological Methods, 4, 84–99.

    Article  Google Scholar 

  15. Choi, S., Cook, K., & Dodd, B. (1997). Parameter recovery for the partial credit model using MULTILOG. Journal of Outcome Measurement, 1, 114–142.

    CAS  PubMed  Google Scholar 

  16. DeMars, C. E. (2002). Recovery of graded response and partial credit parameters in MULTILOG and PARSCALE. Paper presented at the Annual Meeting of the American Educational Research Association, Chicago, IL.

  17. French, G., & Dodd, B. (1999). Parameter recovery for the rating scale model using PARSCALE. Journal of Outcome Measurement, 3, 176–199.

    CAS  PubMed  Google Scholar 

  18. Goldman, S. H., & Raju, N. S. (1986). Recovery of one-and two-parameter logistic item parameters: An empirical study. Educational and Psychological Measurement, 46, 11–21.

    Article  Google Scholar 

  19. Guyer, R., & Thompson, N. (2011). Item response theory parameter recovery using Xcalibre 4.1 (Technical Report). St. Paul, MN: Assessment Systems Corporation. Retrieved from http://www.assess.com/docs/Xcalibre_4.1_tech_report.pdf.

  20. He, Q., & Wheadon, C. (2008). The effect of sample size on item parameter estimation for the partial credit model. Centre for Education and Research Policy. Retrieved from https://cerp.aqa.org.uk/sites/default/files/pdf_upload/CERP_RP_QH_11122008.pdf.

  21. Le, L. T., & Adams, R. J. (2013). Accuracy of Rasch model item parameter estimation. Retrieved from Australian Council for Educational Research. http://research.acer.edu.au/cgi/viewcontent.cgi?article=1013&context=ar_misc.

  22. Meyer, J. P., & Hailey, E. (2012). A study of Rasch, partial credit, and rating scale model parameter recovery in WINSTEPS and jMetrik. Journal of Applied Measurement, 13, 248–258.

    PubMed  Google Scholar 

  23. Preinerstorfer, D., & Formann, A. K. (2012). Parameter recovery and model selection in mixed Rasch models. British Journal of Mathematical and Statistical Psychology, 65, 251–262.

    Article  PubMed  Google Scholar 

  24. Wang, W.-C., & Chen, C.-T. (2005). Item parameters recovery, standard error estimates, and fit statistics of the WINSTEPS program for the family of Rasch models. Educational and Psychological Measurement, 65, 376–404.

    Article  Google Scholar 

  25. Green, K. E., & Frantom, C. G. (2002). Survey development and validation with the Rasch model. Paper presented at the International Conference on Questionnaire Development, Evaluation, and Testing, Charleston, SC.

  26. Wright, B. D. (1977). Misunderstanding the Rasch model. Journal of Educational Measurement, 14, 219–225.

    Article  Google Scholar 

  27. Stone, M., & Yumoto, F. (2004). The effect of sample size for estimation Rasch/IRT parameters with dichotomous items. Journal of Applied Measurement, 5, 48–61.

    PubMed  Google Scholar 

  28. Chen, W.-H., Lenderking, W., **, Y., Wyrwich, K. W., Gelhorn, H., & Revicki, D. A. (2014). Is Rasch model analysis applicable in small sample pilot studies for assessing preliminary item characteristics? An example using PROMIS pain behavior item bank data. Quality of Life Research, 23, 485–493.

    Article  PubMed  Google Scholar 

  29. Smith, R. M. (1996). Polytomous mean-square fit statistics. Rasch Measurement Transactions, 10, 516–517. Retrieved from http://www.rasch.org/rmt/rmt103a.htm.

  30. Karabatsos, G. (2000). A critique of Rasch residual fit statistics. Journal of Applied Measurement, 1, 152–176.

    CAS  PubMed  Google Scholar 

  31. Wright, B. D., Linacre, J. M., Gustafson, J.-E., & Martin-Löf, P. (1994). Reasonable mean-square fit values. Rasch Measurement Transactions, 8, 370. Retrieved from http://www.rasch.org/rmt/rmt83b.htm.

  32. Smith, R. M., Schumacker, R. E., & Bush, M. J. (1998). Using item mean squares to evaluate fit to the Rasch model. Journal of Outcome Measurement, 2, 66–78.

    CAS  PubMed  Google Scholar 

  33. Smith, R. M. (1996). A comparison of the Rasch separate calibration and between-fit methods of detecting item bias. Educational and Psychological Measurement, 56, 403–418.

    Article  Google Scholar 

  34. Linacre, J. M. (2000). Item discrimination and infit mean-squares. Rasch Measurement Transactions, 14, 743. Retrieved from http://www.rasch.org/rmt/rmt142a.htm.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carrie R. Houts.

Ethics declarations

Conflict of interest

Carrie R. Houts, Michael C. Edwards, R. J. Wirth, and Linda S. Deal declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Houts, C.R., Edwards, M.C., Wirth, R.J. et al. A review of empirical research related to the use of small quantitative samples in clinical outcome scale development. Qual Life Res 25, 2685–2691 (2016). https://doi.org/10.1007/s11136-016-1364-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11136-016-1364-9

Keywords

Navigation