A review of empirical research related to the use of small quantitative samples in clinical outcome scale development

Houts, Carrie R.; Edwards, Michael C.; Wirth, R. J.; Deal, Linda S.

doi:10.1007/s11136-016-1364-9

A review of empirical research related to the use of small quantitative samples in clinical outcome scale development

Review
Published: 13 July 2016

Volume 25, pages 2685–2691, (2016)
Cite this article

Quality of Life Research Aims and scope Submit manuscript

Carrie R. Houts¹,
Michael C. Edwards^1,2,
R. J. Wirth¹ &
…
Linda S. Deal³

807 Accesses
9 Citations
3 Altmetric
Explore all metrics

Abstract

Introduction

There has been a notable increase in the advocacy of using small-sample designs as an initial quantitative assessment of item and scale performance during the scale development process. This is particularly true in the development of clinical outcome assessments (COAs), where Rasch analysis has been advanced as an appropriate statistical tool for evaluating the develo** COAs using a small sample.

Methods

We review the benefits such methods are purported to offer from both a practical and statistical standpoint and detail several problematic areas, including both practical and statistical theory concerns, with respect to the use of quantitative methods, including Rasch-consistent methods, with small samples.

Conclusions

The feasibility of obtaining accurate information and the potential negative impacts of misusing large-sample statistical methods with small samples during COA development are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

A novel method for expediting the development of patient-reported outcome measures and an evaluation of its performance via simulation

Article Open access 29 September 2015

Checklist to operationalize measurement characteristics of patient-reported outcome measures

Article Open access 02 August 2016

Critical Assumptions and Distribution Features Pertaining to Contemporary Single-Case Effect Sizes

Article 08 March 2015

Notes

The parameters from this article were selected simply as representative of “real-world” values from a recently published COA analysis. Their use here is one of convenience and should not be taken as a judgement of the analyses conducted or obtained parameter estimates, which were psychometrically sound and found using a sample of over 200 observations.

References

Patrick, D. L., Burke, L. B., Gwaltney, C. J., Leidy, N. K., Martin, M. L., Molsen, E., et al. (2011). Content validity—establishing and reporting the evidence in newly developed patient-reported outcomes (PRO) instruments for medical product evaluation: ISPOR PRO good research practices task force report: Part 2—Assessing respondent understanding. Value in Health, 14, 978–988.
Article PubMed Google Scholar
Stansbury, J. P. (2013). Mixed methods to enhance content validity of measures for use in drug-development trials. In A. Slagle (Eds.), Mixed methods—FDA perspective: Incorporating mixed methods to enhance content validity in drug-development tools. Panel conducted at the patient reported outcome (PRO) consortium workshop. Silver springs, MD. Retrieved from http://c-path.org/PROSlides/Workshop3/2012_PROConsortium_PanelSession2.pdf.
Gorecki, C., Lam**, D. L., Nixon, J., Brown, J. M., & Cano, S. (2012). Applying mixed methods to pretest the Pressure Ulcer Quality of Life (PU_QOL) instrument. Quality of Life Research, 21, 441–451.
Article CAS PubMed Google Scholar
Cappelleri, J. C. (2012). Classical test theory and item response theory: A brief overview. In J. Lundy (Eds.), Mixed methods approach to assuring content validity. Panel conducted at the patient reported outcome (PRO) consortium workshop, Silver springs, MD. Retrieved from http://c-path.org/wp-content/uploads/2013/09/2012_PROConsortium_PanelSession2.pdf.
Lord, F. M. (1983). Small N justifies Rasch model. In D. J. Weiss (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing. New York: Academic Press.
Google Scholar
Linacre, J. M. (1994). Sample size and item calibration stability. Rasch Measurement Transactions, 7, 328. Retrieved from http://www.rasch.org/rmt/rmt74m.htm.
Cappelleri, J. C. (2013). Mixed method approach to evaluating content validity: review and update In A. Slagle (Eds.), Mixed methods—industry and academic experience. panel conducted at the patient reported outcome (PRO) consortium workshop, Silver springs, MD. Retrieved from http://c-path.org/wp-content/uploads/2013/09/PRO_Consortium_PanelDiscussion1.pdf.
Petrillo, J., Cano, S. J., Mcleod, L. D., & Coon, C. D. (2015). Using classical test theory, item response theory, and Rasch measurement theory to evaluate patient-reported outcome measures: A comparison of worked examples. Value in Health, 18, 25–34.
Article PubMed Google Scholar
Lee, O. K. (1992). Variance in mathematics and reading across grades: Grade equivalents and logits. Rasch Measurement Transactions, 6, 222–223. Retrieved from http://www.rasch.org/rmt/rmt62f.htm.
Linacre, J. M. (1999). Investigating rating scale category utility. Journal of Outcome Measurement, 3, 103–122.
CAS PubMed Google Scholar
Takane, Y., & de Leeuw, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52, 393–408.
Article Google Scholar
Costello, A., B., & Osborne, J. W. (2005). Best practices in exploratory factor analysis: Recommendations for getting the most from your analysis. Practical Assessment, Research, and Evaluation, 10,7. Retrieved from http://pareonline.net/getvn.asp?v=10&n=7.
Anthoine, E., Moret, L., Regnault, A., Sbille, V., & Hardouin, J.-B. (2014). Sample size used to validate a scale: A review of publications on newly-developed patient reported outcome measures. Health and Quality of Life Outcomes, 12(1), 30–46.
Article Google Scholar
MacCallum, R. C., Widaman, K. F., Zhang, S., & Hong, S. (1999). Sample size in factor analysis. Psychological Methods, 4, 84–99.
Article Google Scholar
Choi, S., Cook, K., & Dodd, B. (1997). Parameter recovery for the partial credit model using MULTILOG. Journal of Outcome Measurement, 1, 114–142.
CAS PubMed Google Scholar
DeMars, C. E. (2002). Recovery of graded response and partial credit parameters in MULTILOG and PARSCALE. Paper presented at the Annual Meeting of the American Educational Research Association, Chicago, IL.
French, G., & Dodd, B. (1999). Parameter recovery for the rating scale model using PARSCALE. Journal of Outcome Measurement, 3, 176–199.
CAS PubMed Google Scholar
Goldman, S. H., & Raju, N. S. (1986). Recovery of one-and two-parameter logistic item parameters: An empirical study. Educational and Psychological Measurement, 46, 11–21.
Article Google Scholar
Guyer, R., & Thompson, N. (2011). Item response theory parameter recovery using Xcalibre 4.1 (Technical Report). St. Paul, MN: Assessment Systems Corporation. Retrieved from http://www.assess.com/docs/Xcalibre_4.1_tech_report.pdf.
He, Q., & Wheadon, C. (2008). The effect of sample size on item parameter estimation for the partial credit model. Centre for Education and Research Policy. Retrieved from https://cerp.aqa.org.uk/sites/default/files/pdf_upload/CERP_RP_QH_11122008.pdf.
Le, L. T., & Adams, R. J. (2013). Accuracy of Rasch model item parameter estimation. Retrieved from Australian Council for Educational Research. http://research.acer.edu.au/cgi/viewcontent.cgi?article=1013&context=ar_misc.
Meyer, J. P., & Hailey, E. (2012). A study of Rasch, partial credit, and rating scale model parameter recovery in WINSTEPS and jMetrik. Journal of Applied Measurement, 13, 248–258.
PubMed Google Scholar
Preinerstorfer, D., & Formann, A. K. (2012). Parameter recovery and model selection in mixed Rasch models. British Journal of Mathematical and Statistical Psychology, 65, 251–262.
Article PubMed Google Scholar
Wang, W.-C., & Chen, C.-T. (2005). Item parameters recovery, standard error estimates, and fit statistics of the WINSTEPS program for the family of Rasch models. Educational and Psychological Measurement, 65, 376–404.
Article Google Scholar
Green, K. E., & Frantom, C. G. (2002). Survey development and validation with the Rasch model. Paper presented at the International Conference on Questionnaire Development, Evaluation, and Testing, Charleston, SC.
Wright, B. D. (1977). Misunderstanding the Rasch model. Journal of Educational Measurement, 14, 219–225.
Article Google Scholar
Stone, M., & Yumoto, F. (2004). The effect of sample size for estimation Rasch/IRT parameters with dichotomous items. Journal of Applied Measurement, 5, 48–61.
PubMed Google Scholar
Chen, W.-H., Lenderking, W., **, Y., Wyrwich, K. W., Gelhorn, H., & Revicki, D. A. (2014). Is Rasch model analysis applicable in small sample pilot studies for assessing preliminary item characteristics? An example using PROMIS pain behavior item bank data. Quality of Life Research, 23, 485–493.
Article PubMed Google Scholar
Smith, R. M. (1996). Polytomous mean-square fit statistics. Rasch Measurement Transactions, 10, 516–517. Retrieved from http://www.rasch.org/rmt/rmt103a.htm.
Karabatsos, G. (2000). A critique of Rasch residual fit statistics. Journal of Applied Measurement, 1, 152–176.
CAS PubMed Google Scholar
Wright, B. D., Linacre, J. M., Gustafson, J.-E., & Martin-Löf, P. (1994). Reasonable mean-square fit values. Rasch Measurement Transactions, 8, 370. Retrieved from http://www.rasch.org/rmt/rmt83b.htm.
Smith, R. M., Schumacker, R. E., & Bush, M. J. (1998). Using item mean squares to evaluate fit to the Rasch model. Journal of Outcome Measurement, 2, 66–78.
CAS PubMed Google Scholar
Smith, R. M. (1996). A comparison of the Rasch separate calibration and between-fit methods of detecting item bias. Educational and Psychological Measurement, 56, 403–418.
Article Google Scholar
Linacre, J. M. (2000). Item discrimination and infit mean-squares. Rasch Measurement Transactions, 14, 743. Retrieved from http://www.rasch.org/rmt/rmt142a.htm.

Download references

Author information

Authors and Affiliations

Vector Psychometric Group, LLC., 847 Emily Lane, Chapel Hill, NC, 27516, USA
Carrie R. Houts, Michael C. Edwards & R. J. Wirth
The Ohio State University, Columbus, OH, USA
Michael C. Edwards
Pfizer, Inc., Collegeville, PA, USA
Linda S. Deal

Authors

Carrie R. Houts
View author publications
You can also search for this author in PubMed Google Scholar
Michael C. Edwards
View author publications
You can also search for this author in PubMed Google Scholar
R. J. Wirth
View author publications
You can also search for this author in PubMed Google Scholar
Linda S. Deal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carrie R. Houts.

Ethics declarations

Conflict of interest

Carrie R. Houts, Michael C. Edwards, R. J. Wirth, and Linda S. Deal declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Houts, C.R., Edwards, M.C., Wirth, R.J. et al. A review of empirical research related to the use of small quantitative samples in clinical outcome scale development. Qual Life Res 25, 2685–2691 (2016). https://doi.org/10.1007/s11136-016-1364-9

Download citation

Accepted: 07 July 2016
Published: 13 July 2016
Issue Date: November 2016
DOI: https://doi.org/10.1007/s11136-016-1364-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

A review of empirical research related to the use of small quantitative samples in clinical outcome scale development