Introduction

HIV has infected a total of 84.2 million people and claimed 36.3 million lives worldwide since the start of the epidemic [1]. Today, HIV remains to be a major global public issue. An estimated 40.1 million people were living with HIV/AIDS worldwide at the end of 2021 [2]. Given the large population of China, the influence of HIV in China should not be underestimated despite the relatively low prevalence. By the end of 2020, China had 1.05 million people living with HIV/AIDs and 351,000 cumulative reported deaths [3].

The widespread application of the highly active antiretroviral therapy (HAART) has made HIV infection a manageable chronic health condition, enabling people living with HIV/AIDS (PLWHA) to live a longer life. At the same time, HIV infection and antiretroviral treatment could accelerate the aging process of PLWHA [4]. The World Health Organization suggested the age of 50 to be a cut-off to discriminate older subjects within HIV-infected people [5]. As of the end of 2019, there were about 7.5 million PLWHA aged 50 and over worldwide, making up one fifth of PLWHA [6]. As a result of increasing access to effective HIV diagnosis and treatment, China has also witnessed an increasing number of older PLWHA in recent years [7]. In 2011, the proportion of older PLWHA aged between 50 and 64 in China reached 13.6%, up from 1.6% in 2000 [8].

However, longer life expectancy does not necessarily mean better well-being. Alongside physical discomforts, PLWHA also struggle with depression, anxiety, financial stress, and HIV-related discrimination [9]. To fully understand the health status of PLWHA and address their holistic needs beyond viral suppression, patient-reported outcome (PRO) measures should be developed and validated to complement biomarkers to depict patients’ experience with the disease and treatment [10].

Among the previous studies assessing health outcomes of PLWHA, generic instruments have been most widely used as they can facilitate comparison between different disease or treatment groups, but they were not originally designed to identify disease-specific issues and therefore may fail to capture important impacts of HIV [11]. As for specific PRO instruments established for PLWHA, quite a number of them were developed before the wide application of HARRT, decreasing their validity in evaluating treatment effectiveness [12, 13]. Besides, PRO instruments for PLWHA introduced from foreign countries should be used with caution as they might be culturally inappropriate [Preliminary work

Literature review and focus group interviews with health care professionals were conducted first, based on which an initial conceptual framework involving physical, emotional, social, and treatment was generated. According to the conceptual framework, a total of 93 patients were interviewed face-to-face and videotaped. At numerous points in the interview, participants were encouraged to spontaneously add any comments or areas related to the disease that they deemed appropriate and important. Once completed, the videotapes were transcribed. Transcriptions were then compared against the original videotapes by a second set of research assistants. The transcripts of the interviews were reviewed and coded by 2 researchers, and items were generated and categorized. A draft preliminary item pool of 56 items was then presented to patients who had not participated in the initial interviews to evaluate the relevance, importance, comprehensibility, and potential redundancy of items, during which one item was discarded because of overlap with other items. The remaining 55 items comprised the preliminary PROHIV-OLD instrument tested here. Items were scored using a 7-point Likert scale with anchor points labored from “not at all” to “very much”. The recall period is determined to be one month.

Methods

Design and subjects

From February 2021 to November 2021, participants were recruited from six designated hospitals of three cities with varying socioeconomic status according to GDP per capita in Zhejiang Province, China. Participants were followed six months later after first investigation. PLWHA aged 50 and over, with ongoing antiviral therapy were eligible to participate in this study, while those who had cognitive issues, could not understand Mandarin Chinese, or at terminal stage of AIDS were excluded.

The PROHIV-OLD and a validated outcome measure, the Medical Outcomes Study HIV Health Survey (MOS-HIV) [19] were administered at baseline and at 6-month follow up. Demographic and HIV-related information were also collected. The baseline data was used as the study sample for item reduction analyses (Phase I), and the follow-up data as the validation sample to test the final instrument (Phase II).

This study was approved by the Institutional Review Board of Zhejiang University (approval number: ZGL202007-03), and written informed consent was obtained from all participants.

Phase I: item reduction

Item reduction based on the CTT

Distribution of scores of each item was analyzed. An item should be removed if floor or ceiling effects exceed 20% [20]. Items with standard deviations lower than 1, or coefficients of variation lower than 0.3 are deemed to be of low degree of variability and should be removed [21].

Exploratory factor analysis (EFA) aided in item reduction and exploration of factor structure. Exploratory structural equation modeling (ESEM) was also employed to analyze the factor structure. ESEM can be seen as a compromise between the flexibility of EFA and the rigor of SEM [22]. It has been used when factor structures were not yet well established as it allows for a more detailed model fit assessment [23, 24]. The principal axis factoring analysis with an oblique rotation was employed to extract factors. The scree plot [25], Horn’s parallel analysis (PA) [26] and Velicer’s minimum average partial (MAP) [27] were adopted to determine the number of factors to be extracted. Proposed models were compared by ESEM using the following fit indices, chi-square divided by degree of freedom (χ2/df), Tucker-Lewis index (TLI), standardized root mean square residual (SRMR), root mean square error of approximation (RMSEA), and Bayesian information criterion (BIC). Satisfactory model fit requires χ2/df < 3, TLI\( \ge \)0.9, SRMR<0.08, RMSEA<0.08, and a lower BIC [53]. However, few HIV/AIDS specific instruments have been developed using IRT to date. This study used both CTT and IRT to select items in the phase of item reduction, ho** to further improve the performance this instrument.

In item selection by EFA, determining the appropriate number of factors is an important yet controversial issue as no single procedure seems to be entirely satisfactory among the many rules of thumb and statistical indices for addressing the dimensionality issue [54, 55]. The more common indices of the Kaiser’s criterion [54] and the more accurate methods of the PA and MAP [30, 42] were employed in this study to identify the number of latent factors needed to accurately account for the common variance among the items. ESEM, which offers the advantage of providing the overall tests of model fit [56], was then conducted to compare the fitness of the proposed competing models to determine the optimal factor structure. A five-factor structure was finally determined and the factor rotation resulted in as many as 22 items being deleted, the strict requirements of EFA on the number and correlation of variables, as well as the sample size and distribution could explain the large number of items being deleted at this stage [57], previous studies also found quite a number of items being removed by EFA [58, 59].

In item reduction using IRT, 2 items failed to meet the discrimination criterion and were first deleted. Disordered thresholds were detected for 3 items, indicating that respondents may have difficulty in distinguishing between the response options and these 3 items were removed consequently. Uniform DIF was observed for 5 items and 4 items exhibited non-uniform DIF. No consensus has been reached on the disposition of items with DIF. Items with non-uniform DIF were generally required to be deleted, while appropriate weightings can be applied to items with uniform DIF [60, 61]. Some studies suggested to determine the salience of DIF by testing the magnitude of DIF beyond significance, and items that exhibits DIF with large magnitude of impact, whether uniform or non-uniform, are supposed to be deleted [37, 42, 43]. This study also examined the magnitude of DIF, the DIF observed had no substantial influence, therefore only items with non-uniform DIF were finally removed.

The reliability and validity of the final instrument have been rigorously tested. Internal consistency reliability of the PROHIV-OLD was supported by the high Cronbach’s alpha coefficients, McDonald’s ω and CR, which are deemed to be more suitable to evaluate reliability for multidimensional instruments [62], further confirmed the reliability for each dimension. All dimensions demonstrated good test-retest reliability except that the ICC of the physical symptoms dimension was slightly less than 0.7. Apart from disease and treatment related symptoms, the physical symptoms dimension also contains items less specifically related with HIV infection, such as energy, and sleep quality, which might be responsible for the lower test-retest reliability of this dimension.

Regarding the structure validity of PROHIV-OLD, the poor fitness of the one-factor model confirmed that the PROHIV-OLD is multidimensional in nature, and the final structure of the instrument was supported by CFA. Correlations between comparable PROHIV-OLD and MOS-HIV dimensions were stronger than those between less comparable dimensions. The correlations between the role functioning scale of the MOS-HIV with all five dimensions of the PROHIV-OLD were weak. The two entries in the MOS-HIV role functioning scale concern the ability to do certain kinds or amounts of work, housework, or schoolwork, which are no longer the main content of older adults’ social life, instead, their social relationship and interaction will be more confined to family [63, 64], which possibly resulted in the stronger correlation between the MOS-HIV role functioning scale with the PROHIV-OLD family relationship dimension. This also implied the uniqueness of older patients’ experience and the conceptual framework of the PROHIV-OLD.

Known-groups validity was examined across a range of demographic and clinical relevant factors. Similar with existing studies, gender [48] and income differences [49] on dimension scores have been detected. For clinical factors, all the five dimensions of PROHIV-OLD distinguished patients with different levels of CD4+T cell counts well, while no significant associations were found between any dimensions of the PROHIV-OLD and HIV-1 RNA level. The proportion of patients with abnormal plasma HIV-1 RNA level (11.47%) might be too small to detect its effect on patients’ perceived health status. Dyslipidemia was associated with poorer performance on the physical symptoms dimension, whereas patients with abnormal liver or kidney function did not report more physical symptoms. One possible reason was that the liver and kidney function can only be roughly determined based on limited medical information, future studies can consider to employ more precise medical examinations and include respondents’ self-perceived condition.

Several potential limitations of this study should be stated. First, generalizability of this study might be inadequate given that only patients in Zhejiang province were included. Besides, epidemic-related control policies under COVID-19 prevented us from interviewing hospitalized patients, who are at higher possibility of undergoing serious opportunistic infections or other adverse events, which further limited the representativeness of the study sample. Second, for older PLWHA with poor vision, investigators assisted them to fill the survey by reading the items verbatim to them, which might cause selection and social desirability bias. Third, the primary aim of instrument development and validation limited this study to only detect the presence and the salience of DIF, the underlying complex mechanisms for DIF remain to be identified in future qualitative and quantitative studies. Fourth, although the reliability and validity shown in this study seems to be satisfactory, the instrument’s ability to detect change over time remains to be examined to further support the psychometric properties of this instrument. Nevertheless, this large study in multiple sites with rigorous instrument development and validation methods provided a strong foundation for health outcome assessment and promotion for the ever-increasing population of older PLWHA.

Conclusions

The PROHIV-OLD instrument demonstrated acceptable reliability and validity, suggesting that it can be implemented in clinical research and practice to provide further valuable information on health outcome of older PLWHA in China. Other measurement properties such as responsiveness and interpretability will be further examined.