
Economic evaluations of health care interventions often involve incremental cost-effectiveness ratios, where the quality-adjusted life-year (QALY) is used to capture the health outcome of different interventions. Generic preference-based measures (GPBMs) are commonly used to calculate the QALY. Most of the GPBMs, such as the EQ-5D and SF-6D, were developed in Europe and North America, and are often translated into other languages to use in many non-English speaking countries [1]. One of the advantages of using these international GPBMs is that researchers can use the same instrument to measure the health-related quality of life (HRQoL) of populations in different counties or regions, allowing for cross-country/cross-cultural comparisons [2]. Similarly, when these GPBMs are applied in China for assessing health outcomes associated with Traditional Chinese Medicine (TCM), researchers can apply the adapted versions with their corresponding health utility value sets generated based on the Chinese population.

However, health is a culturally related concept, and health evaluation indicators formulated in the Western cultural environment may not include Chinese cultural views on health. A study evaluating the similarities and differences of health-related quality of life concepts between the East and the West compared 8 HRQoL instruments developed in the Chinese cultural context with 3 HRQoL instruments developed outside China. This study found that, although there is a consensus between the East and West on some of the HRQoL domains, domains such as emotional control, weather adaptation, social adaptation, spirituality, and skin color are unique to the Chinese cultural background [3]. Mao Z et al. [4] conducted a Q-methodological investigation study, and the results showed that several HRQoL domains were rated highly as most important by a diverse range of Chinese respondents but were not covered in the commonly used Western HRQoL instrument, such as the EQ-5D.

Traditional Chinese medicine is a model related to the concept of health in Chinese traditional culture, which better reflects the understanding of Chinese culture on health. Some HRQoL instruments developed in China, such as the Chinese quality of life scale [5, 6], the Chinese PRO scale [7], and the sub-health assessment scale [8], have designed health indicators including spirit, appetite, sleep, and other concepts, and have been widely used among the Chinese population. These instruments all reflect the relevant domains of TCM health concept such as "unity of body and spirit, unity of man and nature, unity of man and society," "seven emotions", and "shape, spirit, and emotion." In brief, well-rounded health is the unity of inseparability of the body (including orifices of sense organs) and spirit (including emotion and mind), adaptation to the natural environment and society as well as the harmony of social contact. In TCM terms, the body is an outward manifestation of the spirit, and the spirit is the master of the body. Therefore, the coordination between body and spirit [9] and correspondence between the natural environment and the human body constitute the TCM holism that maintains the consistency of the bio-psycho-social medical model (Fig. 1). However, there are no items with similar meanings as these concepts in the international GPBMs such as the EQ-5D. Therefore, the health states described by these international instruments may not be consistent with TCM theories [10,11,12,13], and therefore, might not be comprehensive for evaluating TCM treatments. Besides, it is commonly recognized that the EQ-5D is not sensitive enough for assessing sub-health conditions due to its ceiling effects, whereas the SF-6D is not adequate for discriminating mild diseases [14, 15].

Fig. 1
figure 1

Theoretical framework of CQ-11D

The CQ-11D (Chinese Medicine Quality of life-11 Dimensions) was therefore developed by the Zhu WT et al. at the Institute of Pharmacoeconomic Evaluation of Chinese Medicine from Bei**g University of Chinese Medicine in 2021. The CQ-11D was developed based on the optimization of the first version of the Chinese Medicine QOL assessment scale (CM-QOL), with the overall view of traditional Chinese medicine and the health concept as the guiding ideology, using literature research, patient interviews, expert consultation, questionnaire surveys and constructed through standard processes. CM-QOL includes 19 items, such as complexion, appetite, sleep quality, stool, and attention. To make the original scale more suitable for compiling discrete choice experiment (DCE) tasks to develop a health utility value set, the CQ-11D was developed by modifying the items included in the CM-QOL. This proves was referred to a previous study modifying the SF-36 to the SF-6D conducted by Brazier et al. [16]. The basic principles of the modification are the followings: To avoid the redundancy of entries, if there are two or more items that basically describe the same aspect of health and are closely related, then only one entry is kept; Items with negative descriptions are preferentially reserved because these items are considered more relevant to health assessments and services. After modification, a TCM HRQoL instrument, the CQ-11D, with 11 items and four levels for each item was finally developed. After evaluations of the measurement properties, it was demonstrated that the CQ-11D has good reliability, construct validity, and standard correlation validity [17]. The CQ-11D has been issued by the China Association of Chinese Medicine under the standard number T/CACM1372-2021 with a release date of August 18, 2021, and an implementation date of August 18, 2021 [18].

DCE with survival duration (DCETTO) is a relatively new preference elicitation technique that is successfully used to generate health utility value sets for GPBMs in many different countries [19,20,21,22,23,24,25]. This technique has not been used previously to value a TCM HRQoL instrument. Respondents complete a series of choice sets, including health state descriptions with an corresponding survival duration. Responses are modeled to generate a set of coefficients that lying on the 1–0 full health–dead QALY scale to calculate the utility values of all health states described by the classification system [26].

Since TCM plays an important role as a kind of Complementary and Alternative Medicine (CAM) for healthcare systems worldwide, a validated instrument for assessing disease impacts and health outcomes is needed for TCM interventions. We aimed to develop the health utility value set for the CQ-11D. This article reports the valuation of the CQ-11D in China using online DCETTO among a representative sample of the Chinese general population.

Materials and methods

CQ-11D instrument

Holism based on TCM theories was used to guide the development of the CQ-11D. The methods for formulating the instrument included searching the literature, interviewing patients, consulting experts, and using a questionnaire survey. The original instrument consisted of two parts: a self-rated health status questionnaire and a visual analog instrument score. The self-assessed health status questionnaire had 11 questions (Table 1) and was divided into two sections: ** CQ-11D, its feasibility evaluation results showed a good acceptance rate. The total Cronbach's α of the scale is 0.820, and the Cronbach's α of each dimension is greater than 0.6, indicating that the instrument had a good internal consistency. Using the exploratory factor analysis method, the KMO value of the scale is 0.791, Bartlett's sphericity test χ2 = 318.414, P < 0.05, which is suitable for factor analysis. The factor analysis results showed that the cumulative contribution rate of variance of the three common factors is 58.603%, and the items in the three common factors have the inherent logical relationship of the scale, indicating the instrument had structural validity. The CQ-11D and the EQ-5D-3L as standard benchmark instruments correlated with 0.651, indicating a good standard validity [17].

Table 1 Indicators of CQ-11D

Investigation method and content

The DCETTO questionnaire was developed by the Lighthouse Studio 9.9.2 software. The accompanying survival time dimensions were set to 4 levels, namely 1 year, 4 years, 7 years, and 10 years. A total of 700 pairs of health conditions were selected and distributed to 70 sets of DCETTO tasks were generated using the balanced overlap method [27,28,29]. Each set (i.e., ten DCETTO tasks) was randomly selected during the survey for the respondent to answer; the task order and the left–right position of health states within each task were all randomized [29]. Mock tests were performed on the generated discrete-choice questionnaires to evaluate the equilibrium of health status extraction. A simulated sample size of 2,400 cases were set in Lighthouse Studio software to test the quality of the discrete choice experimental design. The interaction between each item and the dimension of survival time was checked. The test results showed that 12,015 (50.06%) of the 24,000 choices with a simulated sample size of 2,400 chose option 1, and 11,985 (49.94%) chose option 2. As a general guideline, the standard error should be 0.05 or less for main-effects procedures and 0.10 or less for interaction-effects procedures. The test results show that the standard errors of the main effects are all less than 0.05 (Table 2), the standard errors of the interaction effects are all less than 0.10, and the level of each item is well balanced. Other parts of the questionnaire included CQ-11D, basic information questionnaire, six-dimensional health survey summary form SF-6D, EQ-5D-3L, etc.

Table 2 Experimental extraction of equilibrium main effect simulation test results

Respondent and interviewer

For discrete choice experiments, an average of more than 20 respondents should answer each set of questionnaires in order to estimate a reliable model [30]. The DCETTO design of this study generated 70 sets of questionnaires, so the effective sample size of this study was planned to be 2,400 respondents. Numerous provinces and cities in mainland China were selected for the investigation. The surveyed provinces and cities spread in North China, Northeast China, East China, Central China, South China, Northwest China, and Southwest China with a total of 28 provinces and municipalities, including 118 prefecture-level cities, to cover sufficient geographical distribution and diversified levels of economic development in China. A stratified sampling method was applied, in which two quotas were set for age and sex, to ensure these distributions of the sample resembled those of the general Chinese population (Table 3) [31]. Recruit participants by posting recruitment advertisements in a way that is convenient for the interviewer. Recruitment was conducted in publicly accessible places (Parks, shops, streets, and university campuses) and private areas (participants’ residences). Respondents are required to meet the following inclusion criteria: Age ≥ 18 years old; Chinese citizens with Chinese nationality; Have been living in Mainland China for the past five years; Agree to participate in this research. Respondents are also required not to meet the exclusion criteria: Have listening, speaking, reading, and writing difficulties or are unable to understand the interview content; Abnormal mental condition. The main steps of the investigation were as follows: The respondents were screened into the research and informed consent; The interviewer guided the respondent to complete the CQ-11D questionnaire; The interviewer guided the respondent in completing the DCETTO tasks. In addition, after completing the DCETTO tasks, respondents were asked to self-assess the difficulty of understanding and answering these tasks according to a 5-point Likert scale ranging from very easy to very difficult; The interviewer guided the respondent to complete the background information questionnaire and the EQ-5D-3L and the SF-6D; Recorded the time for the respondent to complete the survey; Checked whether the questionnaire was clear and complete.

Table 3 Sample quota design

Quality control

A total of 125 interviewers divided into six teams were involved with one quality control leader and one project supervisor in each group. The following quality control methods were carried out.

  1. (1)

    Interviewer training. All interviewers received a full-day training, including DCE operational processes, questionnaire examples, and quality control requirements to ensure equivalent task understanding, standard procedures, and good respondent interactions.

  2. (2)

    Team management. All interviewers were divided into six teams. Each team was designated a team leader who was responsible for the management and guidance of interviewers and collecting survey recordings for quality control; there was also a supervisor interviewer who was mainly responsible for the supervision of the process, follow-up visits for respondents, and review of quality control materials (interview sound recordings, informed consents, and other materials) to ensure the data quality.

  3. (3)

    Questionnaire invalidation criteria. 1)The respondent had difficulty understanding the task, was impatient, did not cooperate with the interviewer, or did not respond according to relevant requirements and instructions; 2) The interviewer failed to operate in accordance with the research specifications or the interviewer's manual; 3) The respondent failed to complete the entire questionnaire; 4) The time of completing the questionnaire was too short (less than 5 min), which affected the quality of the interview.

  4. (4)

    The unique design of DCETTO task choice: Each item of the DCETTO task includes four levels, and the corresponding degree words are in the order of best, relatively good, relatively poor, the worst corresponds to the four colors of dark green, light green, light red, and dark red, respectively, in order to facilitate the respondents to understand and remember the degree of the health state (Fig. 2).

  5. (5)

    Data entry: Two research team members daily entered and checked the data to ensure accuracy.

  6. (6)

    Identification of potentially problematic data: Identified the data who always select the same options, such as “AAAAAAAAAA”; or select “ABABABABAB” in the DCETTO [16, 32, 33].

Fig. 2
figure 2

A Sample set of DCETTO choice task (A: Chinses version; B: English version). Note: The corresponding degree words are in the order of best, relatively good, relatively poor, and the worst corresponds to the four colors of dark green, light green, light red, and dark red respectively

Statistical and analysis methods

The DCETTO data were analyzed under the random utility framework using a conditional logit model, which assumes a homogenous preference from the respondents, following the model specification proposed by Bansback et al. [16, 19]:

$${U}_{i}=\alpha +\beta {t}_{dl}+\sum_{d}\sum_{l}{\lambda }_{dl}{x}_{dl}{t}_{dl}+{\varepsilon }_{i},$$

Among them, Ui represented potential utility, tdl represents survival time, xdltdl represented the interaction between item dimension level and survival time, t represented the main effect of survival time, and it was taken as a linear continuous variable [28]. The DCETTO value for each health state can be anchored on the QALY scale as follows:

$${V}_{i}=1+\frac{\lambda }{\beta }{x}_{dl},$$

The variable definitions in the model construction of this study are shown in S1. The dependent variable y is the choice of each respondent, and it is a binary variable with a value of 0 or 1. Independent variables include survival duration, which is considered to be a linear continuous variable. In addition, there are 11 items of the CQ-11D, including “activity” (HD: hd2y, hd3y, hd4y), “appetite” (SY: sy2y, sy3y, sy4y), “Stool status”(DB: db2y, db3y, db4y), “Sleep quality” (SM: sm2y, sm3y, sm4y), “Vigor” (JS: js2y, js3y, js4y), “Dizziness” (TY: ty2y, ty3y, ty4y), “Palpitation” (XH: xh2y, xh3y, xh4y), “Pain” (TT: tt2y, tt3y, tt4y), “Fatigue” (PL: pl2y, pl3y, pl4y), “Irritability” (FZ: fz2y, fz3y, fz4y) and “Frustrated” (JL: jl2y, jl3y, jl4y).

Excel 2016 was used for saving, merging, screening, and basic data conversion. Descriptive statistics were applied by SPSS (Version 20) to summarize the detailed number and proportion of respondents of the specific level of demographic variables. STATA 15.0 was used to construct conditional logit models. We conducted the t-test for continuous variables and the χ2 or Fisher’s exact test for categorical variables. Differences in the distribution of characteristics and model coefficients were considered statistically significant if p < 0.05. A correlation coefficient and difference test were used to determine if respondents' responses were consistent and whether health evaluation results differed across instruments. Because of the large sample size in this study, Spearman correlation coefficients and Pearson correlation coefficients are calculated simultaneously in the correlation analysis if the variable does not conform to the normal distribution. For the EQ-5D-3L, the utility value was calculated using the Chinese value set conducted in 2014 [34], and for the SF-6D, the utility value was calculated using the Chinese Hong value set [35, 36].

This study protocol was approved by the ethics committee of the Bei**g University of Chinese Medicine (Approval number: 2021BZYLL03012). Informed consent was obtained from all respondents included in the study.


Characteristics of the sample

A total of 2,586 respondents were involved, of which 88 interviews were excluded because the respondents did not complete the whole interview (N = 57), or the interviews did not meet the inclusion criteria (N = 5), or answered with logical inconsistencies (N = 9), or the interview took less than 5 min (N = 17). Finally, a total of 2498 respondents were included (Fig. 3). As illustrated in Table 4, 46.08% were males, 42.91% were agricultural accounts, and each geographic distribution ranged from 8.85% to 17.53%. The characteristics of respondents were close to those of the general Chinese population.

Fig. 3
figure 3

Flow chart of sample inclusion

Table 4 Basic information of respondents

The mean ± SD time of the interviews was 14.5 ± 5.9 min, the minimum was 5.0 min, and the maximum was 52.0 min. 68.29% of the respondents thought that the health status displayed by the DCETTO tasks was very easy or easy to understand, and 7.65% of the respondents thought it was difficult or very difficult to understand; in terms of tasks choice, 50.56% of the respondents thought it was very easy or easy, and 18.33% of the respondents thought it was difficult or very difficult. Overall, the DCETTO tasks were relatively easy to complete by the general Chinese population. Nevertheless, potentially problematic answer patterns were observed in respondents who always selected the same options (e.g., 25 respondents responded ‘AAAAAAAAAA’, 5 respondents responded ‘BBBBBBBBBB’ and 6 respondents responded ‘ABABABABAB’) in the DCETTO. These very small proportion of respondents (i.e., 1.40% of total respondents) were not observed noticeable differences in demographic characteristics, and some answers may be due to random errors. Therefore, these respondents were not excluded from this study [44]. This study explores the results of mixed logit model construction. Since the experiment design bias did not show significant preference heterogeneity, the corresponding results were not presented in the research results section. The relevant results can be found in S4. Another problem is the generate the value set under nonlinear temporal preferences. Jonker et al. find that the best statistical fit was obtained when using a hyperbolic discount function, which resulted in smaller QALY decrements and fewer health states classified as worse than immediate death [45, 46]. It’s unlikely to be able to assess non-linear time preferences in this study given that it was optimized under linear time preferences. In the future, the value set of the CQ-11D can be further improved based on the aforementioned research issues.


The study provides the first value set for the CQ-11D, which can facilitate cost-utility analyses when applied to data collected with the CQ-11D prospectively and retrospectively. The valuation tool of the CQ-11D was developed for measuring the quality of life and health utility of patients undergoing traditional Chinese medicine interventions. The application of CQ-11D can support TCM resource allocation in China.