Background

Globally, depression is the second leading cause of life years spent with disability, and the third leading cause of disability-adjusted life-years [1],[2]. Major depression is associated with decreased quality of life, productivity loss and high costs to the patient and society [3],[4]. The 12-month prevalence of depression is 4.7% [5], with a lifetime prevalence of up to 16.6% in adults over 18 years [6].

The epidemiology of depression is traditionally studied using population-based surveys; however, these can only be conducted infrequently while also covering large geographic areas due to their labor intense and time consuming nature. Researchers are seeking alternative data sources for surveillance and population-based studies. Administrative health data, such as physician billing or hospital discharge data, are increasingly being used for epidemiologic and health services research [7]-[9]. These data are typically coded using the World Health Organization (WHO) International Classification of Diseases and Related Health Problems (ICD) codes, with the ICD-9 and ICD-10 classifications being the most widely used international coding systems [9],[10]. There is concern that misclassification of diagnosis in these data can potentially introduce bias into results obtained from their analysis. In spite of this, there is no widely used or agreed upon validated case definition of depression employed by researchers who use administrative data.

The aims of the current study were to: (1) identify validated ICD-9 & 10 depression case definitions in administrative data through a systematic review of the literature; (2) identify individuals with and without depression in administrative data and develop an enhanced case definition to identify persons with depression in ICD-coded hospital data and; (3) assess the diagnostic accuracy of these case definitions compared to a reference standard.

Methods

Study 1. Systematic review of validated depression case definitions

Search strategy

We searched Medline (1948 to January 2013) and Embase (1980 to January 2013) for relevant articles. Our search strategy consisted of the following sets of terms: (1) [administrative data or hospital discharge data or ICD-9 or ICD-10 or medical record or health information or surveillance or physician claims or claims or hospital discharge or coding or codes] AND (2) [validity or validation or case definition or algorithm or agreement or accuracy or sensitivity or specificity or positive predictive value or negative predictive value] AND (3) the medical subject heading terms for depression. Searches were limited to human studies published in English. The broad nature of the search strategy allowed for the detection of modifications of ICD codes, such as international clinical modification (e.g. ICD-9-CM).

Study selection

Two reviewers (CB and AM) independently screened all titles and abstracts to identify those meeting pre-determined eligibility criteria. To be considered, articles had to report on original research that validated specified ICD-9 or ICD-10 coding for depression against a reference standard, and had to report at least one of the following estimates of validity: sensitivity, specificity, positive predictive value (PPV) or negative predictive value (NPV). Articles that validated depression in specialized populations (e.g. diabetes or multiple sclerosis) were excluded to ensure the case definitions would be more generalizable to the general population. Papers that did not employ solely medical encounter data in their definitions (eg. the inclusion of pharmacy data) were also excluded as such data are often unavailable to researchers or do not use ICD-coded data.

Full-text article review was performed by two reviewers (AM or CSS, and CB) for all abstracts identified as eligible for inclusion by either reviewer in the abstract screening stage. Disagreements of eligibility were resolved by a third reviewer (KF or NJ). Bibliographies of included articles were manually searched for additional articles which were then screened and reviewed using the same methods. Local administrative data and depression experts were contacted and asked to identify any additional missing studies.

Data extraction

Two reviewers (CSS and KF) independently extracted and reached agreement on data from the included articles using a standardized data abstraction form. Disagreements were resolved by a third reviewer when necessary (NJ). Validated ICD-9 or ICD-10 based case definitions were abstracted from each paper, along with estimates of sensitivity, specificity, PPV and NPV. The following data were also extracted: study information (author, year, country), population demographics (population size, time of data collection), prevalence of depression, type of administrative database, and a description of the reference standard used to validate the depression case definition.

Quality assessment

Two reviewers (CSS and KF) assessed study quality using a standardized 40-item checklist [11] designed to assess the quality of validation studies of administrative data (Additional file 1). One point was assigned for fulfilling each item on the checklist. If it was unclear whether an item was fulfilled, it was marked as uncertain and no points were given. Questions on the scale relate to methodology (validation cohort description, sampling procedures, study flow diagram) and the reporting of results (based on disease severity, likelihood ratios, subgroup analyses, 95% confidence intervals). Some items on the checklist were not applicable due to the nature of the study. Disagreements were resolved by consensus.

Study 2. Validation of depression case definitions in hospital discharge data

Administrative data source

We had access to data from the discharge abstract database (DAD), a hospitalization database used in all adult hospitals in Calgary, Alberta, Canada, to test the previously published and new case definitions for depression. The DAD includes data on patient demographics, procedures, and principal diagnosis and comorbidities (whether pre-existing and new). Up to 50 diagnoses are coded per hospitalization, including one primary/main diagnosis; only 25 diagnoses are available for release to researchers although it is extremely rare for a patient to have more than 25 diagnoses coded in their DAD record. All records for those patients aged 18 years and over admitted between January 1, 2003 and June 30, 2003 were obtained from this ICD-10 coded data. For patients with multiple admissions in that period, only the first (index) admission of the six-month period was used in the analyses. Ethical approval was granted by the University of Calgary Conjoint Health Research Ethics Board.

Study population and chart review

We randomly selected 4,008 inpatient records from the DAD for patients who were admitted for any indication between January 1 and June 30, 2003 [12]. As no psychiatric hospitals exist in Calgary, this sample can be assumed to include a representative series of admissions for psychiatric and non-psychiatric indications. Charts for each of these 4,008 index visits were subsequently reviewed and diagnoses extracted by two trained chart reviewers with nursing backgrounds. The entire chart was examined in duplicate, including the cover page, discharge summaries, admission notes, consultation reports, physician orders and diagnostic reports. The entire review process took approximately one hour per chart to complete. The presence or absence of depression was based on all documented information in the chart. Detailed methodology describing the study population and chart review has been published elsewhere [12].

Coding algorithms

A total of 11 ICD-based case definitions (algorithms) of depression were tested in the DAD: five algorithms from the studies identified through systematic review, and six new, more inclusive, algorithms developed by the study investigators (neurologist and psychiatrists) (Table 1). The new algorithms included three ICD-9-CM case definitions with increasing levels of ICD-9-CM code inclusivity. Algorithm 1 for ICD-9-CM was the least inclusive: 296.20-296.25 (Major Depressive Disorder, single episode: unspecified, mild, moderate, severe without psychotic behavior, severe with psychotic behavior, in partial remission), 296.30-296.35 (Major Depressive Disorder, recurrent episode: unspecified, mild, moderate, severe without psychotic behavior, severe with psychotic behavior, in partial remission), 300.4 (Dysthymic Disorder) & 311 (Depressive Disorder not Elsewhere Classified); algorithm 2 was moderately inclusive: algorithm 1 plus 296.5 (Bipolar I Disorder, most recent episode depressed), 296.6 (Bipolar I Disorder, most recent episode mixed), 296.82 (Atypical Depressive Disorder) & 296.90 (Unspecified Episodic Mood Disorder); algorithm 3 was the most inclusive: algorithm 2 plus 309.0 (Adjustment Disorder with depressed mood), 309.1 (Prolonged Depressive Reaction) & 309.28 (Adjustment Disorder with mixed anxiety and depressed mood).

Table 1 Validation of International Classification of Disease (ICD) hospital discharge abstract data based on chart review data for depression

Similarly, three new ICD-10-CA based algorithms were developed, also with differing levels of inclusivity. Algorithm 1 for ICD-10-CA was the least inclusive: F32.0-32.9 (Depressive Episode: mild, moderate, severe without psychotic symptoms, severe with psychotic symptoms, other depressive episode, unspecified), F33.0-33.3 (Recurrent Depressive Episode: mild, moderate, severe without psychotic symptoms, severe with psychotic symptoms), F33.8 (Other Recurrent Depressive Disorder), F33.9 (Recurrent Depressive Disorder, Unspecified), F34.1 (Dysthymia) & F41.2 (Mixed Anxiety and Depressive Disorder); algorithm 2 was moderately inclusive: algorithm 1 plus F31.3-31.6 (Bipolar Affective Disorder: mild or moderate depression, severe depression without psychotic symptoms, severe depression with psychotic symptoms, mixed episode); algorithm 3 was most inclusive: algorithm 2 plus F34.8 (Other Persistent Mood Disorders), F34.9 (Persistent Mood Disorder, unspecified), F38.0 (Other Single Mood Disorders), F38.1 (Other recurrent Mood Disorders), F38.8 (Other Specified Mood Disorders), F39 (Unspecified Mood Disorders) & F99 (Mental Disorder, not elsewhere classified). For all algorithms, the presence of only one of the above diagnostic codes in any of the diagnostic positions in the DAD was defined as sufficient to meet the case definition for depression.

We developed a map of depression related ICD codes (Table 2) between ICD-9-CM and ICD-10-CA to be used in our algorithms to enhance comparability and to guide the usage of both coding systems. Two psychiatrists (CB, SBP) and an expert in administrative data and neurologist (NJ) determined the map** between ICD-9-CM and ICD-10-CA by reviewing clinical definitions and coding descriptions.

Table 2 Included International Classification of Disease (ICD) codes and roadmap of ICD-10-CA to ICD-9-CM

Statistical analyses

Estimates of sensitivity, specificity, PPV and NPV were calculated for all of the algorithms listed in Table 1. Sensitivity was defined as the proportion of people who had a depression diagnosis detected in the chart review who also had an ICD code for depression in the DAD. Specificity was defined as the proportion of people who did not have a depression diagnosis coded in the chart and who also did not have an ICD code for depression in the DAD. PPV was defined as the proportion of people with an ICD code for depression who also had a diagnosis of depression in the chart. NPV was defined as the proportion of people without an ICD code for depression who did not have a diagnosis of depression in the chart. All data analyses were conducted in STATA version 11.1 [13].

Results

Study 1. Systematic review of validated depression case definitions

Identification and description of studies

A total of 2,014 abstracts were identified and 36 articles were reviewed in full text, of which three articles met all eligibility criteria (Figure 1, Additional file 2). Only one study included ICD-10 codes [14]. The reference standards used to validate the depression diagnosis included electronic medical records, medical chart review, a clinical interview and depression questionnaire, and self-report mail questionnaire (including questions on a physician diagnosis of a chronic condition, demographic information, smoking status, functional limitations and the veterans version of the Short Form-36 (SF-36)). Characteristics of the three included studies can be found in Table 3.

Figure 1
figure 1

Study flow diagram.

Table 3 Validation studies and results of articles from the systematic review

The reference standard used to validate the depression diagnosis differed between the studies: Alaghehbandan [14] used electronic medical records and medical chart review, Noyes [15] used a clinical interview and depression questionnaire, while Singh [16] used a self-report mail questionnaire. Sensitivity, specificity, PPV and NPV were reported in all papers: sensitivities ranged from 9.9% to 55.1% [15], 34.0% [16] and 69.6% to 77.5% [14]; specificities ranged from 73.3% to 95.6% in one paper [15], while the Singh [16] paper reported a specificity of 97.0% and the Alaghehbandan [14] paper specificities ranging from 93.0 to 93.8%; PPVs ranged from 12.1% to 55.4% [15], 83.0% [16] and 91.2% to 91.7% [14], while NPVs ranged from 65.9% to 95.7% [15], 75.7% to 80.7% [14] and 79.0% [16]. The prevalence of depression was reported in one study, at 18% [15]. Meta-analysis was not considered due to substantial heterogeneity in the administrative databases and differing reference standards employed to validate the depression diagnoses.

Study quality assessment

Based on Benchimol's quality rating scale [11], the Noyes [15] study had the highest quality score (25/41), followed by Singh [16] with 24 out of 41 points and Alaghehbandan [14] with 22 out of 41 points (Additional file 1). All of the articles identified themselves as a study reporting on and including the following: (1) diagnostic accuracy and administrative data; (2) disease identification and validation; and (3) a description of the validation cohort. Every article described the methods of calculating diagnostic accuracy and reported at least four estimates of diagnostic accuracy. Each article also discussed the applicability of their findings to the general population. Points were lost because studies did not provide detailed information on the methodology used in the validation (eg. Inclusion/exclusion criteria, recruitment procedures).

Study 2. Validation of depression case definitions in hospital discharge data

Four hundred and seventy-seven of the 4,008 inpatient charts reviewed indicated the presence of depression, resulting in a hospital-based prevalence of 11.9% (95% Confidence Interval (CI): 10.9-12.9). The five previously validated case definitions [14]-[16] (Table 1), when applied to the DAD data, resulted in specificity estimates above 99.5%, PPVs of almost 90% and higher, and NPVs greater than 88%. Sensitivities ranged from 1.1% - 28.9%.

Three new case definitions for ICD-9-CM and three for ICD-10-CA codes were examined using DAD data (Table 1). For ICD-9, all newly developed case definitions resulted in specificities of over 99%, PPVs of at least 89% and NPVs of over 91%. Sensitivities ranged from 28.9-32.9%, with the greatest sensitivity resulting from the most inclusive case definition (#3). Measures of validity were similar for ICD-10 coding: specificities were greater than 99%, PPVs were at least 89% and NPVs were well over 91%. Sensitivities were slightly better for the three ICD-10 based case definitions, relative to ICD-9, ranging from 34.2-35.6%, with the greatest sensitivity resulting from the most inclusive case definition (#6).

Thirty-two (6.7%) of the 477 depressed persons identified by chart review had a primary diagnosis of depression at the index visit; the remaining 445 depressed persons had ICD codes for depression in secondary diagnostic positions. False negative cases, rather than being coded by one of the depression-related ICD-9-CM or ICD-10 codes in Table 2, were all found to be coded in the DAD as follows: 290.x (Dementias), 300.00 (Neuroses), 301.x (Personality Disorders), 303.x (Alcohol Dependence) or 304.x (Drug Dependence).

Discussion

Researchers often use a variety of administrative data sources to identify cases to answer important health research questions. The most commonly used data sources include hospital discharge data and physician claims data. This study was part of a larger study validating case definitions for a number of conditions in hospital data. Through a systematic review of the literature, two validated ICD-9 based case definitions and one combined ICD-9/ICD-10-CA validation were identified and the validity of the definitions was further investigated in a new dataset. We tested an additional six new case definitions to determine the best algorithm for identifying depression in our administrative hospitalization data. Although physician (including outpatient visits) data were not validated as part of this study, the findings are still important. We demonstrate that if one was interested in studying health outcomes in those admitted with a diagnosis of depression based on administrative data, there is a strong chance that they will have depression. The choice of depression case definitions depends on the purpose for which the data will be used.

For surveillance purposes, sensitivity and PPV need to be balanced so that they are both adequately high. In surveillance, we suggest employing case definition #3 (all definitions are defined in Table 1) for ICD-9 codes and case definition #6 for ICD-10 codes, based on our results. These two algorithms maximize sensitivity without an accompanying loss of specificity. The sensitivity however, is still suboptimal for surveillance purposes in inpatient data. In situations where diagnostic certainty is necessary (eg. The selection of cases in a case–control study, where finding every case is not essential, but ensuring all cases have the disease is), we would recommend case definition #1 for ICD-9-CM (similar to the definition Noyes [15]) and case definition #4 for ICD-10. The Noyes [15] algorithm is similar to our recommended case definition #1, except it does not assess cases currently in remission; the purpose of the new definition was to identify all cases of depression. Specificities were highest for these two new algorithms (#1 and #4), improving the likelihood that those who are not depressed will be identified correctly. All newly tested algorithms for both ICD-9-CM and ICD-10 had high specificities. High specificities are encouraging; in medical coding, there is a risk that unrelated terms containing the word "depression", such as respiratory depression or QT interval depression could be miscoded; our results support that this did not occur. As depression is an episodic and recurrent condition, many individuals with well-established diagnoses may be asymptomatic at the time of presentation for another condition (as evidenced by the 93% of depressed persons having their diagnosis in a secondary position). Administrative data may be better at detecting active cases of depression rather than all individuals with a history of depression.

In 2012 Townsend and colleagues published a systematic review of validated methods for identifying depression using administrative data [17]. The search of the Townsend review was conducted in the summer of 2010; since then additional eligible papers have been published. The present study reviewed multiple databases and includes articles published until January 2013. Only papers validating ICD-9 codes were included in the Townsend review, and the authors further restricted inclusion by only allowing the codes 296.2 (major depressive disorder- single episode), 296.3 (major depressive disorder- recurrent episode), 300.4 (dysthymic disorder), 311 (depressive disorder not elsewhere classified), 298.0 (depressive type psychosis) and 309.1(prolonged depressive reaction). The current review validated combinations of 19 ICD-9-CM codes and 27 ICD-10 codes. Differing inclusion/exclusion criteria (i.e. the use of prescription drug data in the case definition and validation of coding in specialized subpopulations such as diabetes) between Townsend's review and the current study probably explain the discrepancy in the number of included full-text articles (10 versus 3, respectively). The Townsend review reported that most algorithms resulted in only slight to fair levels of agreement with the reference standard. They hypothesize that the inadequate performance of these scales may in part be due to a lack of clinical detection of depression in hospitals.

The prevalence of depression in our hospital-based population (11.9%) is consistent with previous inpatient literature [18],[19], though higher than most studies of individuals with medical conditions in the general population [20]. Patients in our sample rarely have depression coded in the primary diagnostic position, suggesting they presented to hospital for a different indication, and have a co-existing medical condition. This is not surprising as persons with chronic health conditions are more likely to have depression than their healthy counterparts [20],[21]. This finding highlights the importance of examining all diagnostic positions when attempting to identify cases of depression in administrative data.

This analysis had several limitations. The literature review was limited to studies published in English and only in peer-reviewed journals (not the grey literature). It is thus possible the search did not identify all potential studies, as no formal assessment of publication bias was conducted. However, we did search two reference databases known to target articles with different characteristics (Medline and Embase). For the validation of administrative data case definitions, only discharge abstract (hospitalization) data were used, as physician claims data were not available; those patients who were most sick may have been selectively included, which may over-represent the prevalence of depression. On the other hand, a large number of charts (>4,000) were reviewed to decrease sampling variability in this setting. The validity of the coding may differ in other countries where coding is completed by physicians rather than by trained health technologists. However, our coders are highly skilled- in Canada, all professional coders undergo standardized national training [22]- which is a strength of this study. In spite of this, coders do depend on accurate and complete physician documentation in order to code each chart. As such, limited chart documentation could have resulted in undercoding; however, this is likely to occur in most jurisdictions. In addition, the results may not be generalizable to other jurisdictions where coding rules may differ- there may be fewer secondary diagnostic positions available for coding in other regions, and sensitivity may prove to be even lower in these locations. Though the health data used in this study is over 10 years old, there were no interventions or changes in the process of medical data registration in the study setting since that time. Finally, our study included only ICD codes and did not include other specialized data (such as pharmacy data). We specifically chose this approach to make the results more generalizable to other locations. In spite of these limitations, this study has numerous strengths. We improved upon existing case definitions by enhancing the sensitivity of depression case definitions. We also developed a map between ICD-9-CM and ICD-10 coding for the psychiatric codes tested in our algorithms.

Conclusions

This systematic review and validation study examined new and existing case definitions for depression case ascertainment in administrative data. We suggest that validated case definitions be employed in all research involving depression in administrative data. Standardizing depression case definitions in administrative data is very important for study comparison; we recommend using and testing the ICD codes proposed here. Identifying depression in administrative data is challenging: it is an episodic condition that changes over time, so a different time span of observation may have a profound effect on results. In addition, the reference standards used for diagnosing depression have been highly variable and not as well defined as in some other chronic conditions. Case definitions should be selected based upon the objective of the study, as some algorithms maximize case finding and others enhance diagnostic certainty. To be adequate for surveillance purposes, researchers must consider multiple data sources to identify cases with depression, including for example inpatient (validated in this study) and outpatient claims data to optimize sensitivity. Thus, future studies validating physician claims data are warranted. Improvements to the coding of psychiatric conditions in ICD-11 will assist in the standardization of this approach. Future research should focus on further validating depression case definitions using additional data sources (eg. linkage with outpatient and pharmacy data, where available), with an eventual goal of increased sensitivity and specificity and consistent application of ICD codes for depression across disciplines.

Additional files