Log in

How to customize an early start preparatory course policy to improve student graduation success: an application of uplift modeling

  • Original Research
  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

Improving student graduation is of utmost importance for higher education institutions, both public and private. The key contribution of this study is to apply the uplift modeling framework to optimize preparatory course assignments as an instrument to boost student success for graduation. Specifically, we concentrate on two university programs, English and Math preparatory courses, to identify students who would benefit the most from these courses and successfully graduate. To achieve this objective, we analyze 10-years of incoming freshmen data with a wide range of feature variables from a major university in the US. We then build and test several uplift methodologies to determine students’ response to the treatments. The best-performing model allows us to identify students in different segments and target those who are most responsive to the treatment to achieve optimal results. Additionally, we identify the most significant variables and provide student profiles and attributes that distinguish those who would gain from preparatory courses from those who would not. The framework developed in this study can serve as a valuable tool for decision-making and policy support. It can improve not only the student success perspective, but also the allocation of university resources by identifying and advising a fraction of students who would benefit from taking these preparatory courses and have a positive impact on their graduation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. We note that there has been a change in the policy across the university requiring some students to take the course. However, the time frame for the data set herein does not contain the policy change

  2. The gold standard would have been utilizing a fully randomized treatment assignment mechanism.

  3. It is essential to acknowledge that the term "balance" in matching does not pertain to the conventional idea of balance in machine learning. Typically, a "balanced" dataset has the same number of observations across all categories of the outcome variable Y, or an equal number of observations across all treatment groups. In matching, balance alludes to a distinct concept that can be characterized as both the treatment and control groups having "the same joint distribution of observed covariates." (Diamond & Sekhon, 2013).

References

  • Abadie, A. (2005). Semiparametric difference-in-differences estimators. The Review of Economic Studies, 72(1), 1–19. https://doi.org/10.1111/0034-6527.00321

    Article  Google Scholar 

  • Bermeo, C., Michell, K., & Kristjanpoller, W. (2023). Estimation of causality in economic growth and expansionary policies using uplift modeling. Neural Computing and Applications. https://doi.org/10.1007/s00521-023-08397-0

    Article  Google Scholar 

  • DeBerard, M. S., Spielmans, G. I., & Julka, D. L. (2004). Predictors of academic achievement and retention among college freshmen: A longitudinal study. College Student Journal, 38(1), 66–81.

    Google Scholar 

  • Delaney, A. M. (2008). Designing retention research for assessment and enhanced competitive advantage. Tertiary Education and Management, 14, 57–66.

    Article  Google Scholar 

  • Delen, D., Topuz, K., & Eryarsoy, E. (2020). Development of a Bayesian Belief Network-based DSS for predicting and understanding freshmen student attrition. European Journal of Operational Research, 281(3), 575–587. https://doi.org/10.1016/j.ejor.2019.03.037

    Article  Google Scholar 

  • Devriendt, F., Guns, T., & Verbeke, W. (2020). Learning to rank for uplift modeling. http://arxiv.org/abs/2002.05897

  • Dharmawan, T., Ginardi, H., & Munif, A. (2018). Dropout Detection Using Non-Academic Data. In 2018 4th international conference on science and technology (ICST) (pp. 1–4). https://doi.org/10.1109/ICSTC.2018.8528619

  • Diamond, A., & Sekhon, J. S. (2013). Genetic matching for estimating causal effects: A general multivariate matching method for achieving balance in observational studies. Review of Economics and Statistics, 95(3), 932–945. https://doi.org/10.1162/REST_a_00318

    Article  Google Scholar 

  • Elbadrawy, A., Polyzou, A., Ren, Z., Sweeney, M., Karypis, G., & Rangwala, H. (2016). Predicting student performance using personalized analytics. Computer, 49(4), 61–69. https://doi.org/10.1109/MC.2016.119

    Article  Google Scholar 

  • Fischer, E. M. J. (2007). Settling into campus life: Differences by race/ethnicity in college involvement and outcomes. The Journal of Higher Education, 78(2), 125–161.

    Article  Google Scholar 

  • Gershenfeld, S., Ward Hood, D., & Zhan, M. (2016). The role of first-semester GPA in predicting graduation rates of underrepresented students. Journal of College Student Retention: Research, Theory & Practice, 17(4), 469–488.

    Article  Google Scholar 

  • Gross, S. M., & Tibshirani, R. (2016). Data Shared Lasso: A novel tool to discover uplift. Computational Statistics & Data Analysis, 101, 226–235.

    Article  Google Scholar 

  • Gubela, R. M., & Lessmann, S. (2021). Uplift modeling with value-driven evaluation metrics. Decision Support Systems, 150, 113648. https://doi.org/10.1016/j.dss.2021.113648

    Article  Google Scholar 

  • Gubela, R. M., Lessmann, S., & Jaroszewicz, S. (2020). Response transformation and profit decomposition for revenue uplift modeling. European Journal of Operational Research, 283(2), 647–661. https://doi.org/10.1016/j.ejor.2019.11.030

    Article  Google Scholar 

  • Gubela, R. M., Lessmann, S., Haupt, J., Baumann, A., Radmer, T., & Gebert, F. (2017). Revenue uplift modeling. Machine Learning for Marketing Decision Support.

  • Guelman, L., Guillén, M., & Pérez-Marín, A. M. (2012). Random forests for uplift modeling: An insurance customer retention case. Lecture notes in business information processing, 115 LNBIP (pp. 123–133). https://doi.org/10.1007/978-3-642-30433-0_13/COVER

  • Guelman, L., Guillén, M., & Pérez-Marín, A. M. (2015). A decision support framework to implement optimal personalized marketing interventions. Decision Support Systems, 72, 24–32. https://doi.org/10.1016/j.dss.2015.01.010

    Article  Google Scholar 

  • Jaskowski, M., & Jaroszewicz, S. (2012). Uplift modeling for clinical trial data. ICML Workshop on Clinical Data A, 46, 79–95.

    Google Scholar 

  • Kane, K., Lo, V. S., & Zheng, J. (2014). Mining for the truly responsive customers and prospects using true-lift modeling: Comparison of new and existing methods. Journal of Marketing Analytics, 2(4), 218–238. https://doi.org/10.1057/jma.2014.18

    Article  Google Scholar 

  • Khan, Z., Gul, A., Perperoglou, A., Miftahuddin, M., Mahmoud, O., Adler, W., & Lausen, B. (2020). Ensemble of optimal trees, random forest and random projection ensemble classification. Advances in Data Analysis and Classification, 14(1), 97–116. https://doi.org/10.1007/s11634-019-00364-9

    Article  Google Scholar 

  • King, J. E. (1999). Hel** students balance work, borrowing, and college. About Campus, 4(4), 17–22.

    Article  Google Scholar 

  • Kostopoulos, G., Kotsiantis, S., & Pintelas, P. (2015). Estimating student dropout in distance higher education using semi-supervised techniques. In Proceedings of the 19th Panhellenic conference on informatics (pp. 38–43).

  • Lai, Y.-T., Wang, K., Ling, D., Shi, H., & Zhang, J. (2006). Direct marketing when there are voluntary buyers. In Sixth international conference on data mining (ICDM’06) (pp. 922–927). https://doi.org/10.1109/ICDM.2006.54

  • Larose, S., Cyrenne, D., Garceau, O., Harvey, M., Guay, F., Godin, F., Tarabulsy, G. M., & Deschênes, C. (2011). Academic mentoring and dropout prevention for students in math, science and technology. Mentoring & Tutoring: Partnership in Learning, 19(4), 419–439.

    Article  Google Scholar 

  • Lo, V. S. Y. (2002). The true lift model. ACM SIGKDD Explorations Newsletter, 4(2), 78–86. https://doi.org/10.1145/772862.772872

    Article  Google Scholar 

  • Maldonado, S., Miranda, J., Olaya, D., Vásquez, J., & Verbeke, W. (2021). Redefining profit metrics for boosting student retention in higher education. Decision Support Systems, 143, 113493. https://doi.org/10.1016/j.dss.2021.113493

    Article  Google Scholar 

  • McGrath, M., & Braunstein, A. (1997). The prediction of freshmen attrition: An examination of the importance of certain demographic, academic, financial and social factors. College Student Journal.

  • Morgan, S. L., & Winship, C. (2014). Counterfactuals and causal inference. Cambridge University Press. https://doi.org/10.1017/CBO9781107587991

    Book  Google Scholar 

  • Musso, M. F., Hernández, C. F. R., & Cascallar, E. C. (2020). Predicting key educational outcomes in academic trajectories: A machine-learning approach. Higher Education, 80, 875–894.

    Article  Google Scholar 

  • Olaya, D., Vásquez, J., Maldonado, S., Miranda, J., & Verbeke, W. (2020). Uplift modeling for preventing student dropout in higher education. Decision Support Systems, 134, 113320. https://doi.org/10.1016/J.DSS.2020.113320

    Article  Google Scholar 

  • Oztekin, A. (2016). A hybrid data analytic approach to predict college graduation status and its determinative factors. Industrial Management & Data Systems, 116(8), 1678–1699. https://doi.org/10.1108/IMDS-09-2015-0363

    Article  Google Scholar 

  • Palacios, C. A., Reyes-Suárez, J. A., Bearzotti, L. A., Leiva, V., & Marchant, C. (2021). Knowledge discovery for higher education student retention based on data mining: Machine learning algorithms and case study in Chile. Entropy, 23(4), 485. https://doi.org/10.3390/e23040485

    Article  Google Scholar 

  • Radcliffe, N. J., & Surry, P. D. (2011). Real-world uplift modelling with significance-based uplift trees. White Paper TR-2011-1, Stochastic Solutions (pp. 1–33).

  • Rice, D. (2009). Product review: Faculty success through mentoring: A guide for mentors, mentees, and leaders. Adult Learning, 20(1–2), 42–43. https://doi.org/10.1177/104515950902000111

    Article  Google Scholar 

  • Rubin, D. B. (2005). Bayesian inference for causal effects. In The annals of statistics (pp. 1–16). JSTOR. https://doi.org/10.1016/S0169-7161(05)25001-0

  • Shimizu, A., Togashi, R., Lam, A., & Van Huynh, N. (2019). Uplift modeling for cost effective coupon marketing in c-to-c e-commerce. In 2019 IEEE 31st international conference on tools with artificial intelligence (ICTAI) (pp. 1744–1748).

  • Stuart, E. A., & Green, K. M. (2008). Using full matching to estimate causal effects in nonexperimental studies: Examining the relationship between adolescent marijuana use and adult outcomes. Developmental Psychology. https://doi.org/10.1037/0012-1649.44.2.395

    Article  Google Scholar 

  • Tampakas, V., Livieris, I. E., Pintelas, E., Karacapilidis, N., & Pintelas, P. (2019). Prediction of students’ graduation time using a two-level classification algorithm. In Technology and innovation in learning, teaching and education: First international conference, tech-ed (pp. 553–565). Springer. https://doi.org/10.1007/978-3-030-20954-4_42

  • Thomas, L. (2002). Student retention in higher education: The role of institutional habitus. Journal of Education Policy, 17(4), 423–442. https://doi.org/10.1080/02680930210140257

    Article  Google Scholar 

  • Thomas, L. (2012). Building student engagement and belonging in Higher Education at a time of change. Paul Hamlyn Foundation, 100(1–99).

  • Yizar Jr, J. H. (2010). Enrollment factors that predict persistence of at-risk (low income and first generation) students' journey towards completion of a baccalaureate degree at Idaho State University. Idaho State University.

  • Yorke, M. (2016). The development and initial use of a survey of student ‘belongingness’, engagement and self-confidence in UK higher education. Assessment & Evaluation in Higher Education, 41(1), 154–166.

    Article  Google Scholar 

  • Zepke, N., & Leach, L. (2010). Improving student engagement: Ten proposals for action. Active Learning in Higher Education, 11(3), 167–177.

    Article  Google Scholar 

Download references

Funding

First author acknowledges the financial support by Craig School of Business—CO RSCA and PRSCA 22/23.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yertai Tanai.

Ethics declarations

Conflict of interest

Both authors declares that he has no conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Data set variables

Appendix A: Data set variables

Variable

Description

Type

% missing

Imputation

ABC

Admission basis code

Categorical

0

ACAD_PLAN

Student initial academic plan

Categorical

0

ACT

American College Test (ACT) score-English

Numeric

73

Median

ACT_COMP

American College Test (ACT) score-composite

Numeric

0

ACT_MATH

American College Test (ACT) score-math

Numeric

0

ACT_READ

American College Test (ACT) score-reading

Numeric

0

ACT_SCI

American College Test (ACT) score-science

Numeric

0

CITIZ_M

Citizenship status

Categorical

0

COLLEGE_CODE

College campus code

Categorical

0

CRED_OBJ

Credential or Subject Matter waiver objective

Categorical

85

New category “unknown”

CRED_STATUS

CCTC-approved education credential status

Categorical

0

CSU_RACE_CAT

Student ethnicity

Categorical

0

DEGR_OBJ

Student immediate degree objective code

Categorical

0

DEP_FAM_SZ

Family size of student determine him/herself dependent for financial aid purposes

Numeric

0

DEP_INCOME_CODE

Family income level of student determine him/herself dependent for financial aid purposes

Categorical

0

DEPT_CODE

Highest degree held by the student

Categorical

0

EAPE_STATUS

Student English remediation status

Categorical

17

New category “unknown”

EAPM_STATUS

Student Mathematics remediation status

Categorical

19

New category “unknown”

EDU_FATHER

Student father’s highest-level education-attained

Categorical

0

EDU_MOTHER

Student mother’s highest-level education-attained

Categorical

0

ELM_REC

Student Entry Level Mathematics (ELM) score

Numeric

0

ELM_STATUS

Student Entry Level Mathematics (ELM) receive status

Categorical

0

ENR_STATUS

Student current enrollment status

Categorical

0

EPT_COMP

Student English Placement Test (EPT) score-composition

Numeric

45

Median

EPT_ESSAY

Student English Placement Test (EPT) score-essay

Numeric

45

Median

EPT_READ

Student English Placement Test (EPT) score-reading

Numeric

45

Median

EPT_STATUS

Student English Placement Test (EPT) status

Categorical

0

EPT_TOT

Student English Placement Test (EPT) score- total

Numeric

45

Median

GE_COMP_STATUS

Student GE-Breadth English composition status

Categorical

0

GE_CRIT_STATUS

Student Ge-Breadth Critical Thinking course status

Categorical

0

GE_MATH_STATUS

Student GE-Breadth Mathematics/Quantitative easoning course status

Categorical

0

GE_ORAL_STATUS

Student GE-Breadth Oral Communications course status

Categorical

0

HISP_ETH_CAT

Hispanic/Latino Ethnic category

Categorical

46

New category “unknown”

HISP_STATUS

Hispanic/Latino Ethnic status

Categorical

0

HS_GPA

Student High School GPA

Numeric

0

HS_TRANS_STATUS

Student High School Transcript receive status

Categorical

0

INDEP_INCOME_COD

Gross income level of student reported as independent applicant

Categorical

0

INSTI_M

Student latest institution of origin type

Categorical

0

MAJOR_CODE

Student major

Categorical

0

MULT_RACE_CD

Student race

Categorical

0

OPTSFIX_CD

Student major code

Categorical

0

PREP_ENG

Number of semesters of college preparatory English

Numeric

0

PREP_MATH

Number of semesters of college preparatory mathematics

Numeric

0

PREP_SOC_SCI

Number of semesters of college preparatory social sciences

Numeric

27

Median

RES_CODE

Residential type

Categorical

0

RES_STATUS

Residential status

Categorical

0

SAT_COMP

Scholastic Assessment Test (SAT) score-composition

Numeric

11

Median

SAT_MATH

Scholastic Assessment Test (SAT) score-math

Numeric

11

Median

SAT_SCORE

Scholastic Assessment Test (SAT) score-total

Numeric

73

Median

SAT_VERB

Scholastic Assessment Test (SAT) score-reading

Numeric

11

Median

SEX_M

Student gender

Categorical

0

STD_LEV

Student current academic level

Numeric

0

TR_GPA

Transfer GPA

Numeric

70

Median

TR_UN

Transfer units earned

Numeric

0

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tanai, Y., Ciftci, K. How to customize an early start preparatory course policy to improve student graduation success: an application of uplift modeling. Ann Oper Res (2023). https://doi.org/10.1007/s10479-023-05607-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10479-023-05607-9

Keywords

Navigation