Introduction

Background

Regarding the development of health economics, uncertainty has been one of the most important issues in medicine for many years. In 1963, a classic study suggested that uncertainty is due to information or knowledge asymmetry between patients as consumers and medical staffs as suppliers [1]. In other words, the medical knowledge of the staff is too difficult and complex for patients to understand. Another study explained that there are various types of uncertainty, such as diagnosis and administration [2]. Theoretical studies on the definition of uncertainty can be categorized into three types. The first is a systematic classification that subdivides uncertainty into three composite factors (personal, practical, and scientific) [3], from disease- to patient-oriented [4], and other three types of uncertainty (conceptual, methodological, and ethical) [5]. The second is a qualitative approach that clarifies the uncertainty that clinicians often face, such as attention deficit hyperactivity disorder [6], primary health care [7], and prostate cancer [8]. The third is a specific theme to develop a new analytic method, such as evaluating the degree of an effect [9, 10].

Recently, cost-effective analysis was developed from multivariate sensitivity analysis using the Bayesian approach [11]. Almost all the existing research used a model-based approach by employing the following standard procedures: (1) model formularization, (2) parameter setting, and (3) effect estimation [12,13,14,15,16,17,18,19,20,21,22,23]. An artificial database was created for the estimation. Based on the results, various future simulations indicate the threat to the existing medical care system, such as social health insurance. Moreover, these studies contribute to building a guideline for health technology assessment (HTA). Thus, these studies can be called the model-technology-based approach.

However, only a few research articles have analyzed existing databases to evaluate medical uncertainty despite the importance of cost estimation of pharmacy services [24,25,26]. In addition, previous studies have not evaluated each patient’s uncertainty because they focused on individuals by following the guideline of HTA [27]. The use of observational real-world data recorded by each patient has been suggested in the past 10 years [11]. Since the secondary use of electronic medical records (EMRs) as real-world data is now imminent, it is essential to integrate methods with concrete data items that are mandatory to record in the standard level EMRs in estimating uncertainty. After reviewing the existing studies, the present research recommends the data-patient-based approach explained below.

Study objective

The objective of the present study is to develop a method to estimate the financial uncertainty (FU) of each patient based on a discounted present value (DPV)—one of the most popular methods in economics. This method will be applicable to standard EMRs; it uses only three items—the AC of medical fees, length of hospital stay (LHS, calculated as days), and predicted mortality rate (PMR). Based on a prediction model, some explorative variables that can be used to explain patients’ condition were employed from existing EMRs to estimate the PMR.

Methods

Study design and participants

A retrospective cohort for data analysis was constructed using EMRs of the University of Miyazaki Hospital. The study period is 5 years, from April 1, 2013 to March 31, 2018. The use of these records was approved by the Committee of Medical Ethics, University of Miyazaki (ethics approval number O-0758). The following two raw databases were used to create the cohort: (1) patient information, which includes the date of hospitalization and discharge as well as patients’ characteristics, such as sex, age, and disease information, and (2) claim information, which includes the AC of the social health insurance system in Japan.

Figure 1 shows the process of cohort creation. The patient information was recorded by the patient identification (patient ID) and the date of hospitalization implemented as the unique key of the cohort. Furthermore, the mortality was evaluated at only the present hospitalization (date of hospitalization is 1 in Fig. 1) because previous hospitalization can be considered that patients didn’t die. On the other hand, the claim information was summarized by the patient ID and the date of hospitalization (called the summarized claim information).

Fig. 1
figure 1

Data processing flowchart

After the preparation of these databases above, the cohort was created by merging the patient information and the summarized claim information. With this data merging, there were two exclusion criteria as follows: unmerged data and missing values in explorative variables. As a result of data merging, two types of the cohort were created (DS1 and DS2). DS1 was recorded by the patient ID by kee** information only about the present hospitalization to build a mortality risk prediction model. As the evaluation of the mortality was implemented at only the present hospitalization, it was necessary for appropriate model building to use records about only the present hospitalization. On the other hand, DS2 keeps all records to estimate FU by each hospitalization during the observation period.

Furthermore, DS1 and DS2 were divided into two subgroups, new and existing groups. These groups were allocated by whether each hospitalization can refer to information about the previous hospitalization. For example, the patient ID 1 in Fig. 1 records both the new group at the date of hospitalization 2 that cannot refer to the date 3 and the existing group at the date 1 that can refer to the date 2. These groups were implemented because prospective variables for model building differed according to whether patients can use information about the previous hospitalization.

Outcome measure

According to a fundamental textbook in economics [28], DPV can be calculated using the formula for estimating uncertainty at different times as follows:

$$DPV=\frac{c}{{(1+r)}^{t}}$$

where \(c\) is the cash flow at \(t\) years (called a future value); \(r\) is a discount rate, and \(t\) is the period of investment (years). The present study aims to convert this formula to calculate a discounted AC (DAC) as follows:

$$DAC=\frac{{c}_{0}}{{(1+{p}_{0}/{l}_{0})}^{{l}_{1}}}$$

where \({c}_{0}\) is the future value of the AC; \({p}_{0}\) is the PMR used as the discount rate; \({l}_{0}\) is the mean LHS in the cohort that is used to convert each \({p}_{0}\) to daily values (called a daily PMR (DPMR)), and \({l}_{1}\) is the actual LHS of each hospitalization as the number of exposure days to the treatment risk. The primary outcome measure is FU, which is the difference between the actual AC and DAC.

Mortality risk prediction model

The objective variable for the risk prediction model is mortality because mortality is a typical hard endpoint for an acute medical condition in medical organizations, including the University of Miyazaki Hospital [29, 30]. Some previous studies have compared mortality to major patient characteristics (such as sex, age, and diagnosis of disease) [31], LHS [32], and readmission because patients have sarcopenia or not [33].

In this study, the following 15 explorative variables were created to build the prediction model: (1) sex; (2) age; (3) body mass index (BMI); (4) smoke (yes or no); (5) activities of daily living ((ADL) yes or no); (6) Japan Coma Scale ((JCS) yes or no); (7) cancer information (yes or no); (8) operation (yes or no); (9) plan change (yes or no); (10) comorbidity (yes or no); (11) post-hospital disease (yes or no); (12) ADL transition (no to no, no to yes, yes to no, or yes to yes); (13) JCS transition (no to no, no to yes, yes to no, or yes to yes); (14) LHS (calculated as the date of discharge at last hospitalization minus date of last hospitalization plus 1); (15) passed time until present hospitalization ((PTUPH) calculated as the date of present hospitalization minus date of discharge at last hospitalization plus 1). The 1st to 7th variables were implemented on both the new and existing groups because they are extracted from information about only present hospitalization, which is before the intervention of the present hospitalization. The 8th to 15th variables were implemented on only the existing group because they can be extracted from information about the last hospitalization.

These explorative variables have been explained in detail. ADL was coded as a binary system as follows: no (patients with ADL code “2312132222,” which means they need no assistance with ADL) or yes (otherwise). This code consists of ten digits that explain ten types of physical conditions, such as diet, excretion, and walking. If this code is “2312132222,” patients have no difficulty with these ten points. JCS was coded as a binary system as follows: no (missing value or zero) or yes (otherwise). A plan change, which was created from our previous study [34], is as follows: “A plan change would be implemented if the ICD-10 of the main disease differs from that of the disease for which medical resources were implemented, or if there is existence of disease with secondary implementation of medical resources.” Additionally, the ICD-10 means the 10th revision of the International Statistical Classification of Diseases and Related Health Problems. ADL or JCS transition was a comparison of hospitalization and discharge of the four categories above.

Statistical methods

The statistical analyses in the present study were divided into three parts. First, a crude mortality rate (CMR) was calculated using each explorative variable. While comparing the CMR, a chi-squared test was performed on all the categorical variables. Additionally, Student’s t-test was conducted on four continuous variables—age, BMI, LHS, and PTUPH.

Second, a logistic regression model was built to predict the mortality rate using the 15 explorative variables. Despite the four continuous variables, the variable was always categorical because it reveals the clinical characteristics, such as childhood, adulthood, or elderly. The regression was conducted as both univariate and multivariate analyses. A variable is used in the multivariate model if, based on the recent standard procedure, its p-value is less than 0.25 in the univariate model [35]. An area under the curve (AUC) was implemented as supplemental information about this model-building procedure. AUC is an area of a receiver operating characteristics curve that is drawn using a true positive rate (vertical axis) and a false positive rate (horizontal axis) [36].

Finally, the third part was to estimate FU using the DAC equation. When PMR is converted to daily value (called daily PMR (DPMR)), \({l}_{0}\) is 17.3 days (new group) and 19.0 days (existing group). The FU was estimated to evaluate the influence of the FU of each disease using three heading of ICD-10.

All statistical analyses were performed using SAS University Edition (SAS Institute Inc., NC, USA).

Results

Table 1 presents the number of patients and CMR of each of the 15 explorative variables.

Table 1 List of the 15 explorative variables

Based on the variable selection using the univariate model (Table 2), the odds ratio (OR) of the multivariate model used to estimate the PMR were calculated (Table 3). Additionally, the AUC was calculated within a 95% confidence interval as follows (in parentheses): (1) new group = 0.844 (0.823, 0.861) and (2) existing group = 0.859 (0.842, 0.878).

Table 2 Odds ratio in the prediction model (univariate analysis)
Table 3 Odds ratio in the prediction model (multivariate analysis)

Table 4 presents the total actual AC of the top 20 diseases and compares the FU in both actual value amount and a percentage to the actual AC. In the table, five diseases are in bold because their rate of FU is higher than 20%. The actual AC and the FU are USD 462,873 thousand and USD 40,638 thousand for all diseases, and USD 154,341 thousand and USD 17,017 thousand for the top 20 diseases.

Table 4 Financial uncertainty in the top 20 diseases, descending, sorted by AC

Discussion

Key result

The results have both theoretical and clinical implications based on various statistical values, such as CMR, OR, AUC, the rate of FU, and mean LHS or DPMR (Tables 1, 2, 3 and 4). As indicated in Table 1, in each category, there are differences in the CMR of the 15 explorative variables. In particular, JCS (yes for both groups), LHS (which is less than 28 in the existing group), and PTUPH (which is equal to or less than 7 in the existing group) have higher values than the other category, implying that patients must take emergency readmission if PTUPH is equal to or less than 7.

While building the model (Tables 2 and 3), five and 13 variables were used in the multivariate model for the new and existing groups, respectively. The present study does not discuss a validity of patient classification based on AUC of the prediction model such as a criterion of an inspection because the present study uses PMR as a characteristic value to estimate FU of each patient outside actual treatment. However, the performance of the model was moderate because AUC is approximately 85% in both groups.

As indicated in Table 4, the FU of each disease differs significantly. The rate of FU (the percentage of the FU to the actual AC) in the five diseases in bold is greater than 20% because there was a higher value in their DPMR values compared to the other 15 diseases that are not in bold. Individually, the DPMR of C71 (brain tumor) was remarkably higher than that of the other four diseases in bold because physical function would often be impaired. Namely, brain tumor patients recorded a higher DPMR because their ADL (one of the explorative variables) tended to be “yes” with a higher OR than the other variables. Next, the mean LHSs of P07 (low-weight child), C91 (lymphocytic leukemia), and C92 (myeloid leukemia) were remarkably longer than those of the other 15 diseases that are not in bold. The reason is different for P07 and others. It is difficult for P07 patients to decide on how long medical staff should keep treating them because they are akin to newborn babies under precarious conditions. However, as per clinical guidelines, patients with C91 or C92 must stay in a clean room for a long time because reinforced chemotherapy is often carried out on them [37,38,39,40]. Finally, the mean DPMR and LHS of C85 (non-Hodgkin’s lymphoma) is less than those of the other four diseases in bold. However, the rate of FU is greater than 20% because the mean DPMR is higher than those of the 15 diseases that are not in bold. Although C85 is similar to C91 or C92 as a blood cancer [41], the reason for the higher rate is different among C85, C91, and C92.

Limitations

The present study has two limitations. The first is a theoretical issue from the study design using a retrospective cohort. Since our databases could record only a few data items of patients’ typical characteristics, such as sex, age, and disease, our results do not eliminate all confounding factors in compensation for the easy use of numerous participants.

The second is an insufficient discussion of the objective variable as the basis of the discount rate (\({p}_{0}\)). Because data were easily collected from our databases, the objective variable is patients’ mortality. However, mortality is not always appropriate as an objective variable. Various outcomes do not relate to mortality but have a considerable negative influence on patients’ quality of life [42, 43]. Furthermore, the use of extracorporeal membrane oxygenation would be necessary to estimate the uncertainty of the coronavirus disease of 2019 [44]. Although their information is more difficult to record in EMRs as part of routine processing than mortality, their information comprises an appropriate additional event for estimating the discount rate. Therefore, objective variables should be decided based on the aims of each individual study.

Significance

The primary contribution of this study is to develop a systematic method of estimate FU in medicine using DPV, one of the most fundamental economic methods. Although DPV is often calculated to estimate uncertainty in various industries, a method of uncertainty estimation in medicine has been developed by each patient. Our method can define FU in medicine as the difference between AC and DAC based on DPV. Therefore, the present study can contribute to analytic methods in health economics such as cost-effectiveness analysis.

In detail of the primary contribution, the practical value of our method is that it can contribute to decision-making in health policy worldwide, because of the following three novel reasons. First, our method is more systemic than that of previous studies because a few typical items (AC, LHS, and objective or explorative variables for model building) in standard EMRs are required to estimate FU; this means that the generalizability of the present study is high. Second, the method employs LHS as an exposure time of treatment risk in medicine but not an efficiency indicator. Although various research articles [45,46,47,48,49] have demonstrated the importance of decreasing LHS, the present study has discussed LHS from different perspectives. Finally, the practice of attaching too much importance to decreasing LHS is criticized herein because of the social health insurance system in Japanese acute medical organizations called the Diagnosis Procedure Combination (DPC) payment system. In the DPC payment system, the revenue (equal to the daily AC herein) decreases daily [50]. This system provides medical organizations with an incentive to decrease LHS to improve efficiency. Some research articles have demonstrated the positive influence of this system [51, 52]. However, our empirical analysis indicates that patients with some diseases (the five diseases in bold in Table 4) cannot avoid long-time hospitalization because of their uncertainty. Despite the incentive of the DPC payment system, these diseases require long-time hospitalization to maintain safety. Therefore, these diseases would be inappropriate for the DPC payment system. Thus, several diseases that have higher levels of uncertainty should be excluded from the DPC system (which is a daily comprehensive payment system according to each disease) to a volume payment system according to each treatment as a health policymaking issue.

Our secondary contribution is to improve the technique of model building. Our prediction model can predict a patient’s potential risk at hospitalization but not the discharge time. Almost all research uses a prediction model, such as a logistic regression model, to evaluate the effectiveness of target treatment using all data item recorded during hospitalization. However, the explorative variables in our model are limited to items that can be recorded before treatment. Despite the difference in the data items in our model and the existing studies, our model has recorded a moderate level of AUC. Therefore, our model can be used as a real-time prediction model in a clinical workspace.