FormalPara Key Points for Decision Makers

The presented cohort model approach to estimate the national prevalence of catastrophic costs due to tuberculosis (TB), adjusting for variability across studies to reflect national demographics and loss to follow-up along the patient pathway of care, allows for the costs of those in care to be captured more accurately.

This approach facilitates estimation of the prevalence of catastrophic costs due to TB and the uncertainty of these estimates, and can identify the comparative impact of TB-related costs on different sections of the population.

Depending on the policy application, this approach could serve as a feasible alternative to country-wide national surveys to estimate catastrophic costs due to TB, where there are sufficient existing data available.

1 Introduction

Tuberculosis (TB) remains the leading cause of death from a single infectious agent worldwide, with 10 million people falling ill and 1.2 million people dying from TB in 2018 [1]. Often those who are most affected by TB are the most vulnerable in society, and affected households can face substantial costs associated with the disease [2]. Globally, costs associated with TB represent an average of 58% and 39% of individual and household income, respectively [3].

In recognition of the impact of the costs of illness on households, the World Health Organization (WHO) has highlighted reduction of catastrophic costs due to TB as one of three priority targets for 2020 [1]. Costs due to TB are defined as ‘catastrophic’ by the WHO Global TB Programme where they exceed 20% of a household’s annual pre-TB income [4]. The focus of this metric is on economic hardship associated with seeking TB care, including direct out-of-pocket medical costs (such as money paid for medicines, diagnostics, consultation fees or informal payments made to health workers), direct non-medical costs (transport and accommodation costs, costs of any special food or supplements taken because of illness) as well as indirect (opportunity) costs of time spent seeking care by people with TB and guardians or household members accompanying them [5,6,7,8].

To help track country progress against this goal and inform programme planning, the Global TB Programme has developed guidelines for the conduct of nationally representative cross-sectional surveys to estimate the prevalence of catastrophic costs [4]. However, these surveys require ample resources and time to complete and will not be feasible for all 130 WHO member states to carry out repeatedly, leaving countries searching for another source of estimates.

In many settings, data on patient costs of TB have been collected as part of trials or other smaller-scale projects; however, recent systematic reviews of patient-incurred costs due to TB observed large heterogeneity in the quality of reporting as well as the methods used to collect cost data and measure income loss [3, 10, 11]. Given this variation, it is currently unclear to what extent this existing data can be used to inform national estimates of catastrophic costs due to TB. We hypothesize that with the use of a cohort model these data could still be a useful resource for countries looking for decision-making support, in the absence of a national survey. We aim to investigate approaches to model the national prevalence of catastrophic costs due to TB using the case study of South Africa, which has one of the world’s highest TB incidence rates, with an estimated incidence of 520 per 100,000 people in 2018 [12].

2 Methods

2.1 Parameterizing the Cohort Model: Population Characteristics

We created an individual-level deterministic cohort model that simulated progression through the TB care cascade in order to estimate the prevalence of catastrophic costs in South Africa (Fig. 1). The model contained a hypothetical cohort of 1000 South Africans with drug-susceptible (DS) TB, with population characteristics mirroring those of the national population of people with DS-TB. Individuals in the cohort were first distributed across national income quintiles 1–4 using data on the national income distribution and distribution of TB across income quintiles [13,14,15]. We then sampled employment status by income quintile, and household size reflecting national distributions of each [16]. Individual income was estimated by dividing household income by household size; individual income took a value of zero if unemployed or otherwise not income-earning. HIV sero-status was modelled for each individual in the cohort based on the national HIV prevalence among individuals with DS-TB [17]. We then estimated the likelihood of loss to follow-up before treatment start based on HIV status, following evidence from Naidoo et al. [17].

Fig. 1
figure 1

Analysis structure. DS drug-susceptible, TB tuberculosis

2.2 Parameterizing the Cohort Model: Cost Data

2.2.1 Identifying and Reconciling Primary Data

We collated all research articles reporting any estimates of patient-incurred costs due to TB in South Africa from the Unit Cost Study Repository (UCSR) of the Global Health Cost Consortium [18]. Patient-incurred costs included any costs paid out-of-pocket by TB patients and their households, and any lost income or productivity due to TB. Eleven studies presenting patient cost data in South Africa were identified [19,20,21,22,23,24,25,26,27,28,29]. Of these, four were excluded due to outdated models of care and one was excluded as a duplicate of previously published data. Corresponding authors of seven eligible studies were invited to participate, and a protocol identifying variables to be included in the pooled dataset was provided. Collaborators from three of the seven eligible studies agreed to participate in the analysis. Due to data availability, the scope of this analysis was restricted to costs whilst on treatment for DS-TB; we did not consider costs for drug-resistant (DR) TB, nor did we consider costs during the diagnostic process [10]. All datasets had obtained ethical approval for their original study. Ethical approval for the pooled analysis was granted by the London School of Hygiene and Tropical Medicine (reference 14486).

We reconciled timeframes for cost data by identifying the treatment start date, interview date and recall period for each participant. Direct out-of-pocket costs incurred in each treatment phase (intensive and continuation phase) were categorized as direct medical costs (consultation fees, medicines, diagnostics), direct non-medical costs (transportation, accommodation) and food costs (food supplements, special foods). Cost estimates were distinguished by treatment phase (intensive and continuation phase) and by type of healthcare provider, including public healthcare (PHC) facility (study site), another PHC facility (non-study site), private general practitioner, pharmacy, hospital inpatient service, hospital outpatient service, and traditional healer.

All data in different studies were collected using adaptations of the Tool to Estimate Patient Costs [30], and thus definitions for out-of-pocket cost variables were homogeneous; however, the Researching Equity in ACcess to Healthcare (REACH) dataset did not contain information on direct non-medical costs or time spent accessing providers other than the main study clinic. As this was omitted entirely from data collection, we assumed these values to be missing at random and used imputation to complete these costs (imputation methods described in Sect. 2.2.2). In contrast, methods for collecting data on income and estimating indirect costs varied widely across datasets and were not reconcilable. To complete the datasets, we took a statistical approach to predict income quintile for households in the dataset. Assuming income distribution to be the same as the national distribution of income amongst people with TB, we used regression coefficients from an analysis run on the most recent (2015) South African National Income Dynamics Survey (NIDS) for variables including asset holdings, housing quality indicators and basic demographics to predict income. Full methods to predict household income quintiles are described in Electronic Supplementary Material (ESM) Appendix 2.

All costs are reported in 2017 US dollars. Data collected before 2017 were inflated using the US consumer price index [31]. Prior to generating model parameters using the standardized data, we conducted a descriptive analysis of sociodemographic and cost variables within and across datasets. Variables were summarized using the mean and standard deviation for each individual dataset and across the pooled dataset. We tested for significant differences in categorical variables using a chi-squared test, and tested for significant differences in continuous variables using a one-way analysis of variance (ANOVA) within and between studies.

2.2.2 Generating Model Parameters

We tested two approaches to estimate mean and standard error values for direct costs and hours lost due to treatment by household income quintile, HIV status and treatment phase: (1) meta-analysis of summary statistics from the standardized datasets; and (2) regression analysis of the pooled standardized dataset.

Our first approach was meta-analysis to calculate pooled estimates of available (study-level) mean values for the above-described cost categories for each treatment phase, by HIV status and household income quintile [32]. Given that patient demographics varied significantly across datasets, and assuming that patient costs vary according to demographics, we used a random effects meta-analysis approach, which does not assume that all studies investigate the same population [32]. Data on direct costs, travel time and consultation time were log-transformed for the meta-analysis as they were highly skewed, and results were exponentiated following meta-analysis.

Our second approach was to identify a regression model to predict the above-described cost categories for each treatment phase, by HIV status and household income quintile. Firstly, we imputed missing values in the pooled dataset. Where total consultation hours were missing, we used multivariate imputation with chained equations (MICE) to impute these values based on the number of visits by phase and provider type. Total travel hours and total direct non-medical costs were imputed based on number of visits and transport method, as well as demographic variables included in the regression analysis. All imputations used predictive mean matching (PMM), as a non-parametric alternative for imputing skewed data. Imputations generated 20 plausible datasets, which were then used for analysis. The number of missing observations by dataset are listed in ESM Table 3 [33].

Following imputation, we conducted a series of regression analyses to predict the cost of each cost category for each treatment phase. The regression analyses used a generalized linear model (GLM) approach assuming a gamma distribution and a log link to accommodate skewed data [34]. The specification of each regression was held constant across analyses and included independent variables identified following theory, as well as previous published evidence [35,36,37,18], the wide variance in methods used to collect cost data remains a persistent limitation in the feasibility of pooling data. There are several areas where further data collection or better guidance on data collection methods would improve these estimates substantially. Firstly, methods for estimation and reporting of income data in patient-incurred cost surveys are currently inconsistent, with limited guidance on methods [41]. Going forward, better guidance on methods to estimate household and individual income is critical for any future attempts to pool data for drawing national estimates as well as more generally informing policy. Guidance on the appropriate measures of indirect costs in the numerator, and ability to pay in the denominator (e.g. household income vs. household expenditures), would also improve the theoretical validity of the metric [41].

Despite the above-discussed uncertainties, this type of model could be useful for researchers and policy makers. A cohort model such as the one presented in this paper can estimate the national prevalence of catastrophic costs due to TB and the uncertainty around these estimates, and can identify the comparative impact of TB-related costs on different sections of the population. It also has potential to inform certain policy decisions; for example, Verguet et al. [40] use a similar approach to illustrate the potential number of catastrophic costs averted from a range of TB interventions. The approach presented in this paper improves estimates by using a systematic approach to pool data from multiple studies, and allowing for adjustment of demographics and by treatment phase.

The usefulness of the type of analysis presented in this paper depends on the objectives of the analysis. This analysis may be sensitive enough to capture major movement towards the End TB goal of zero catastrophic costs due to TB; however, it is likely not sensitive enough to capture small changes from year to year—especially in settings where the cost function is still unknown or differs substantially in different settings. Ongoing primary data collection through national surveys is likely still necessary to facilitate annual reporting and programme management until the availability of cost and epidemiological data improves, and the cost function is better identified. However, while probably not providing quite as robust an estimate of catastrophic costs as a national survey, this type of analysis can complement, enrich and add depth to findings from the national surveys, especially for certain groups.

5 Conclusions

This paper presents a novel use of existing data to estimate the prevalence of catastrophic costs due to TB [4]. We find that in the absence of nationally representative surveys, a deterministic model can provide an alternative for estimating catastrophic cost prevalence and the uncertainty around those estimates, with uncertainty slightly reduced using a regression approach as compared with a meta-analysis approach. A repeat of this analysis with additional primary data from South Africa added would test the validity of the main finding. Analyses testing the results of a cohort model against national estimates of catastrophic costs of other conditions would also help researchers to understand the validity of these models and the value of information added as compared with primary data collection through national surveys. Ultimately, to improve estimates from such cost-saving approaches, there is an urgent need for standardized methods to collect income data and standardized reporting of cost estimates.