Background

Carpal Tunnel Syndrome (CTS) is the most common compression neuropathy of the upper extremity, happening as the results of median nerve entrapment in the carpal canal [1]. Persons with CTS have sensory or motor problems in the area innervated by the median nerve [2]. The prevalence of CTS has been estimated to be 4–5% in the general population, with a higher prevalence in the working population [3].

In its latest guideline, the American Association of Orthopedic Surgeons (AAOS), has categorized CTS clinical diagnostic tests in four main categories: 1) provocative maneuvers (e.g. Durkan’s test, Phalen’s test), 2) sensory and motor tests (e.g. heat/cold sensation, thenar muscles atrophy), 3) questionnaires and scales (Boston carpal tunnel questionnaire, CTS-6 scale), and 4) hand symptoms diagrams/maps (such as Katz and Stirrat’s hand symptoms diagram) [4]. Advantages of clinical diagnostic tests include that they can be done quickly, do not cost much, are not painful, and yield immediate results.

A systematic review (SR) of the diagnostic accuracy of clinical examination tests was conducted by one of our research team members in 2004 and is currently outdated [5]. Several original studies have been published after 2004 that were not included in any other reviews in the past 16 years [6,7,8,9,10,11]. This paper is one of a series of updated SRs related to the diagnostic accuracy of CTS clinical diagnostic tests categorized by the AAOS. We previously published an SR of scales, questionnaires, and hand symptom diagrams [12]. The focus of this SR is on sensory and motor tests, and we aimed to identify, critically appraise and synthesize the evidence on the diagnostic accuracy of the sensory and motor tests for diagnosing CTS in individuals with suspected CTS.

Methods

We registered the protocol of this SR on December 20, 2018 with the International Prospective Register of Systematic Reviews (PROSPERO), with the registration number of CRD42018109031 [13]. We followed the Diagnostic Test Accuracy extension of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (DTA-PRISMA) [14] and the Cochrane collaboration guidelines in develo** and reporting this SR [15].

Information sources

We conducted a systematic computerized search of Medline and Embase through Ovid, as well as CINAHL, all from inception until January 20, 2020. We developed our search strategy in consultation with a health science research methodologist librarian at McMaster University in two meetings. We originally developed a search strategy that captured all the four components of the clinical diagnostic tests outlined by the AAOS. However, due to the large number of study results and the variety of identified tests, we only focused on sensory and motor tests in this SR to increase the ease of readability for the target audience. Our search strategy included search terms for three main concepts including CTS, diagnostic accuracy properties, and names of the diagnostic tests for CTS. The search strategy can be found in Appendix A.

Study selection

Two authors (AD, JY) independently selected studies in two consecutive phases. In the first phase of study selection, titles and abstracts of the included citations were reviewed based on a pre-determined set of eligibility criteria. In order to enhance the quality of the review process, AD and JY initially reviewed 100 of the citations and resolved their disagreements through discussions. In the second phase, after retrieving the full texts of the included articles, two authors again independently assessed the eligibility of the articles for inclusion in this SR. The kappa agreement between the authors in the first phase of screening (titles and abstracts) was calculated using STATA statistical analysis software, version15 [16]. Kappa values below 0.20 suggest poor agreement, and values of larger than 0.80 indicate perfect agreement [17]. Any disagreements between AD and JY in the process of study selection was resolved by the most experienced research team member (JM) through discussion.

Eligibility criteria

We did not exclude any studies based on their language, sample size, choice of reference standard, or gender of the included participants. We included studies that met the following inclusion criteria.

Design

We included case-control, cross-sectional, and cohort (both retro- and prospective) study designs that were in a full-report format.

Participants

We included studies on persons who were diagnosed or suspected to have CTS and were older than 18 years old. The studies must have had a control group of people diagnosed with any type of upper limb musculoskeletal, neurological, or vascular conditions, such as cervical radiculopathy, or De Quervain’s tenosynovitis. We excluded studies that had healthy control groups, as healthy control groups would falsely inflate the diagnostic accuracy properties and are not reflective of the actual clinical settings.

Diagnostic test

Studies that assessed the diagnostic accuracy of at least one sensory or motor test for CTS diagnosis.

Comparison

Since there is no gold standard for CTS diagnosis, we decided to accept studies with any reference standard, ranging from electrodiagnosis testing, to carpal tunnel release surgery and clinical examination tests.

Outcome

We included articles that reported at least one diagnostic accuracy property, such as sensitivity (Sn), Specificity (Sp), positive predictive value (PPV), negative predictive value (NPV), or articles providing enough data on their test results enabling us to (re)synthesize 2 × 2 contingency tables.

Time

Any time frame reporting diagnostic accuracy of the sensory or motor tests for CTS diagnosis.

Data extraction

Initially MG and AD extracted data from three of the included studies, and since the agreement was high, MG did the remainder of the extraction independently, and AD cross checked the information. We used a self-developed, pre-determined extraction sheet previously developed to extract information for a SR of diagnostic accuracy of scales, questionnaires and hand symptom diagrams for CTS diagnosis. We extracted the following data:

  1. 1)

    Information about the studies, such as authors, study design, year and country, conflicts of interest.

  2. 2)

    Information on the participants, such as sample size, age, gender, inclusion and exclusion criteria, diagnoses, severity and duration of symptoms, and CTS prevalence in the sample.

  3. 3)

    Information regarding the index test, index test methodology and threshold criteria for positive results, as well information on the reference standard.

  4. 4)

    Any information on the diagnostic accuracy properties of the sensory or motor tests, such as Sn, Sp, NPV, PPV.

Data synthesis and analysis

We extracted information on Sn, Sp, PPV, NPV, positive likelihood ratio (+LR), negative likelihood ratio (−LR) and their associated 95% confidence intervals (95%CIs) from the included studies, where possible. When this information was not directly reported in the studies, we tried to calculate them by reconstructing 2 × 2 contingency tables based on the available data on true and false positives and negatives.

PPV and PPV are affected by the prevalence of the condition in the sample, for instance, an increase in the prevalence of a given condition in a sample increases the PPV and decreases the NPV [18]. To overcome the previously mentioned issues associated with NPV and PPV, we tried to calculate and report +LR and -LR, where possible. Likelihood ratios are independent from the prevalence of the condition in the sample and provide a more accurate clinical judgment [18]. Following is an interpretation of the likelihood ratios: +LR > 10, and -LR < 0.1 indicate a great change in the posttest probability and are very valuable in the clinical decision-making process [18]. + LR of 5 to 10 and -LR of 0.1 to 0.2 indicate a moderate change in the posttest probability of having a condition [18]. + LR of 2 to 5, and -LR of 0.2 to 0.5 indicate slight change in the posttest probability [18]. Lastly, +LR <  2 and -LR > 0.5 have no clinical value in calculating the posttest probability [18].

We categorized and presented the information on the diagnostic accuracy of the sensory and motor tests for CTS diagnosis in separate tables. The results were grouped into ‘sensory tests for CTS diagnosis’ and ‘motor tests for CTS diagnosis’, with each category organized by the frequency of the diagnostic test being assessed. Due to the heterogeneity of the data (different sample characteristics, different index and reference tests methodology and criteria for positive results) we could not conduct a meta-analysis.

Assessment of risk of Bias and applicability concerns

Two authors (AD, JY), independently rated the risk of bias and applicability concerns of the included studies based on the revised tool for the quality assessment of diagnostic accuracy studies (QUADAS-2) [19]. In case of any disagreements in rating the quality of the studies, a third research team member (JM) was engaged and the disagreement was resolved through discussion. The QUADAS-2 tool assesses risk of bias in four domains: a) patient selection, b) index test, c) reference standards, and d) study flow and timing [19]. Moreover, QUADAS-2 rates the applicability concerns in three domains addressing patient selection, index test, and reference standard [19].

Results

We identified 5552 citations through the electronic database search. After removing the duplicates, we reviewed the titles and abstracts of 4052 citations. In the second phase of screening, we reviewed the full texts of 161 articles, of which 16 articles were included in this SR (Fig. 1. PRISMA diagram). The reviewers had a kappa agreement of 0.70 (SE: 0.02, 95% CI = [0.66–0.74]) in screening the titles and abstracts. The studies were conducted in USA, Sweden, France, Canada, Spain, Portland, Italy, and Turkey. Appendix B summarises the reported conflict of interests of the included studies. The characteristics of the included studies are presented in Table 1. All of the studies had prospective cross-sectional designs, except for two studies that had retrospective designs [6, 23], and one that had prospective cohort design [11].

Fig. 1
figure 1

PRISMA diagram

Table 1 Study characteristics

Thirteen sensory or motor tests were assessed in the included studies, which were Semmes-Weinstein monofilament (SWMFs)(n = 7), vibrometry (n = 4), hand grip strength (n = 2), pinch grip strength (n = 2), thumb abduction weakness (n = 3), functional dexterity test (n = 1), thenar muscle atrophy (n = 3), hypoesthesia (n = 2), two-point discrimination (n = 4), tactile thresholds (n = 1), Von Frey hairs (n = 1), warm and cold thresholds (n = 1), and graphesthesia (n = 1). A description of these tests, as well as their method of conduction and positive tests criteria are presented in Table 2.

Table 2 Description of Sensory/Motor Tests for Carpal Tunnel Syndrome diagnosis (sorted alphabetically)

Participants’ characteristics are summarized in Table 3, including their age, gender, duration and severity of symptoms, sampling method, process of selection, and eligibility criteria. Overall, 2763 individuals were included in these studies, of whom 1131 had CTS.

Table 3 Participants’ characteristics table

Risk of Bias and applicability concerns of the included studies

All of the studies had low risk of bias rating in the patient selection domain of the QUADAS-2 and enrolled a consecutive sample of participants, avoided a case-control design and inappropriate exclusions. Six studies had unclear risk of bias ratings in the index test domain. It was unclear if the index tests results were interpreted without the knowledge of the results of the reference standard. Three studies had high, seven studies had unclear, and six studies had low risk of bias in the reference standard domain. The main reason for low ratings was the lack of blinding of the person performing the reference standard test. Eleven studies had unclear ratings on the flow and timing domain, because there was no mention of the appropriate interval between index and reference standard tests administration.

Regarding the applicability concerns of the included studies, nine studies had low concerns, four had unclear, and three studies had high concerns. In the patient selection domain, three studies had high, one study had unclear, and eleven studies had low applicability concerns. In the index test domain, only three studies had unclear concerns and the rest of the studies (thirteen studies) had no concerns regarding applicability. Lastly, in the reference standard domain, one study had high concerns, two studies had unclear concerns, and thirteen studies had no concerns regarding applicability. The visual demonstration of the risk of bias and applicability concerns of the included studies is presented in Figs. 2 and 3.

Fig. 2
figure 2

Risk of bias and applicability concerns of the included studies, using QUADAS-2 tool

Fig. 3
figure 3

The proportion of included studies with low, high, or unclear risk of bias and concerns regarding the applicability, using QUADAS-2 tool

Diagnostic accuracy of sensory tests for CTS diagnosis

The diagnostic accuracies of the SWMFs, two-point discrimination, vibrometry, hypoesthesia, tactile thresholds, Von Frey hairs, graphesthesia, and warm and cold thresholds were assessed in the included studies. See Tables 4 and 5 for detailed results.

Table 4 Diagnostic Accuracy of the Semmes-Weinstein monofilament test for CTS diagnosis
Table 5 Diagnostic Accuracy of Vibrometry for CTS diagnosis

Semmes-Weinstein monofilaments (SWMFs) test was assessed in seven of the included studies [6, 8, 21, 25,26,27, 29]. The reported sensitivities and the specificities ranged from 13 to 98%, and from 9 to 93%, respectively [6, 8, 21, 25,26,27, 29]. The authors of this SR calculated +LR and -LR, which ranged from 1.6 to 7, and from 0.98 to 0.12, respectively. Different decision rules were tested in the studies, which resulted in different diagnostic accuracies, and are summarized in Table 4. In the study by Szabo et al. 1999, SWMFs was performed in two positions, neutral and Phalen’s position (90 degrees of wrist flexion) [29]. The results from this study indicated a better diagnostic accuracy for SWMFs test, when done with wrist flexion (Sn = 83%, Sp = 44%, +LR = 1.48, −LR = 0.38) [29]. Furthermore, Szabo et al., calculated the PPV and NPV based on five hypothetical CTS prevalence, ranging from 1 to 20% [29], with the details of this analysis being summarized in Table 4.

Two-point discrimination test was assessed in four studies [7, 20, 21, 23]. In the study by Borg & Lindblom, only the Sn was calculated, which was 30% [20]. In the other three studies, the Sn was 6, 32, and 63%, the Sp was 98, 81, and 85%, the +LR was 3, 1.68, and 4.2, and the -LR was 0.95 and 0.84, and 0.43.7,20,217 In the study by Katz et al. 1990, PPV and NPV were calculated based on two CTS prevalence [23]. In a sample with 40% CTS prevalence (sample 1), the PPV was 54%, with a 95% CI ranging from 37 to 70, and the NPV was 63% (95%CI 58 to 68) [23]. In sample 2 with a CTS prevalence of 15%, the PPV was 23% and the NPV was 87% [23].

Vibrometry was assessed in four studies [20,21,22, 26]. In the study by Borg et al. 1988 [20], only Sn was calculated for vibrometry testing, which was 52%. Franzblau et al., incorporated three different reference standards, which were NCS; NCS + symptoms consistent with CTS; and physical examination findings and symptoms consistent with CTS [22]. The highest diagnostic accuracy values occurred when taking physical examination findings as the reference standard (Sn = 11%, Sp = 93%, +LR = 1.57, −LR = 0.95) [22]. In the study by MacDermid et al. 1997, two testers performed the vibrometry [26], which resulted in different diagnostic accuracies as summarised in Table 5.

Hypoesthesia was another form of sensory testing for CTS diagnosis assessed in our included studies. In a study by Raudino (2000), only the Sn was calculated, which was 32% [28]. In another study, the following diagnostic accuracy properties were reported: Sn = 51%, Sp = 85%, PPV = 85%, NPV = 51%, +LR = 3.4, and -LR = 0.57 [24].

Lastly, Tactile thresholds, Von Frey hairs, graphesthesia, and warm and cold thresholds were only assessed in one study [20]. In this study by Borg & Lindblom, only the Sn was calculated, which was 52% for tactile thresholds, 52% for Von Frey hairs test, 24% for graphesthesia, and 15% for warm and cold thresholds [20]. Borg & Lindblom assessed the diagnostic accuracy of six sensory tests, which were vibrometry, two-point discrimination, tactile thresholds, Von Frey hairs, graphesthesia, and warm and cold thresholds. They called this combination, quantitative sensory testing (QST), and it had a Sn of 82% [20].

Diagnostic accuracy of motor tests for CTS diagnosis

The motor tests assessed in the included studies were thumb abduction weakness, thenar atrophy, hand grip strength, pinch grip strength, and functional dexterity tests. Each test is summarized below, and detailed information can be found in Table 6.

Table 6 Diagnostic Accuracy of Hand Grip Strength, Pinch Grip Strength, Thumb Abduction Weakness, Thenar Atrophy, and Functional dexterity tests for CTS diagnosis

Thumb abduction weakness was assessed in three studies [24, 28, 30]. The reported sensitives and specificities from these studies ranged from 12.1 to 66%, and from 66 to 73%, respectively [24, 28, 30]. As calculated by the authors of this study, the +LR were 1.37, and 1.94, and the -LR were 0.51 and 0.86 for thumb abduction weakness testing [24, 30]. We could only obtain the values for sensitivity from Raudino 2000 study [28].

Thenar atrophy was assessed by three studies [7, 9, 11]. The Sn of the thenar atrophy test was minimal, with values ranging from 5.5 to 22%, but it was a highly specific test, with Sp ranging from 96 to 100% [9, 11].

Hand grip strength was assessed in two studies. In Franzblau et al. ‘s study [22], hand grip strength was compared to three different reference standards: 1) electrodiagnosis, 2) electrodiagnosis and symptoms consistent with CTS, and 3) physical examination findings and symptoms consistent with CTS [22]. The highest diagnostic accuracy results came from taking physical examination findings and symptoms consistent with CTS as the reference standard, which yielded a Sn of 32% and a Sp of 94% [22]. As calculated by the authors of this study, hand grip strength testing had a + LR of 5.33, and a -LR of 0.72. In addition, Szabo et al. 1999 found that hand grip strength had the following diagnostic accuracy: Sn = 48 (95% CI 26–70), Sp = 30 (95% CI 14–46) [29]. Positive and negative predictive values were calculated using five hypothetical CTS prevalence, which are summarized in Table 6. In general, the lowest CTS prevalence (1%) resulted in the worst PPV (1%) and the best NPV (98%) [29].

Pinch grip strength was assessed in two studies. In a study by MacDermid et al., two testers performed the pinch grip strength testing and identified Sn of 72 and 70% for testers 1, and 2, respectively [26]. The Sp values were 88% for tester 1 and 78%, for tester 2 [26]. According to Franzblau et al., when taking physical examination findings and symptoms consistent with CTS as the reference standard, pinch grip strength had a Sn 21%, and Sp 95% [22].

Functional dexterity test was only assessed by Sartorio et al. 2017, and was found to have Sn of 84% (95% CI 72–90%), Sp of 64% (95% CI 41–82%), +LR of 2.37 (95% CI, 1.23–4.55), and -LR of 0.25 (95% CI, 0.13–0.49) [10].

Reference standards for CTS diagnosis

Out of the 16 included studies, 11 studies had nerve conduction studies (NCS) as their reference standard. These studies had different criteria for positive test results, which are summarized in Appendix C. In the remaining five studies, the following reference standards were considered. Borg & Lindblom [20] (1988) used a combined battery of tests as the reference standard. This combined battery of tests included formal CTS screening, the neurological examination and the electrophysiological testing. No further information was provided by the authors. Dale et al. 2012, had three different reference standards: 1)Modified Katz hand diagram and people suspected with CTS were categorized as having ‘classic, probable, possible, or unlikely’ CTS; 2) NCS; 3)A consensus criteria for CTS case definition, requiring having classic or probable CTS rating on the modified Katz hand diagram, and abnormal median nerve conduction testing [8]. MacDermid et al. used a clinical diagnosis by a specialist hand surgeon combined with NCS as their reference standard [25, 26]. Finally, Franzblau et al. 1993, had three different reference standards, which were 1)NCS; 2)NCS + surveillance symptom definitions for CTS; 3)Physical examination + surveillance symptom definitions for CTS [22].

Discussion

This study synthesized sixteen clinical studies reporting on thirteen different sensory and motor tests. Among these tests, none had consistent evidence for high diagnostic accuracy. These results suggest clinicians should not rely on the results of one single sensory or motor test for CTS diagnosis, instead using a combination of several of sensory and motor tests, or other combinations of tests from different AAOS categories to rule in/rule out CTS.

In this SR, we found the most specific tests for CTS diagnosis were the hand (palmar) grip strength test [22] (Sp of 94%), pinch grip strength, (Sp from 78% [26] to 95% [22]), thenar atrophy (Sp from 96 to 100%) [7, 9, 30], and 2PD (Sp from 81 to 98%) [7, 21, 23]. Tests with high Sp can detect true negative cases with a great precision and have a low false positive outcome [18]. This feature can assist clinicians in differentiating between CTS and non-CTS cases. Of the included sensory and motor tests, the most sensitive for CTS diagnosis was the SWMF test, using the 3.22 monofilament size in any radial finger as the normal threshold, with Sn values ranging from 49% [6] to 96% [26]. Tests with high sensitivity have low false negative results, which is an important factor for screening purposes [31]. In other words, when the objective of a clinician is to screen people with suspected CTS, they should use a highly sensitive test (low specificity values are tolerable); therefore, the SWMF is potentially a useful screening tool.

Our results confirm the findings of a recent clinical practice guideline by the Academy of Hand and Upper Extremity Physical Therapy and the Academy of Orthopedic Physical Therapy of the American Physical Therapy Association [32]. This guideline recommended using the SMWFs test with either 3.22 or 2.83 as the normal threshold for mild to moderate CTS cases, and for more severe CTS cases, a 3.22 threshold should be considered. Compared to the previous SR on this topic by MacDermid and Wessel in 2004 [5], in this updated SR we mainly focused on a sample with no healthy controls, and this was the main difference of the two SRs. Moreover, MacDermid and Wessel concluded that the most specific (but not sensitive) tests for CTS were the 2PD and testing of thumb abduction weakness [5]. We did find the 2PD as one of the most specific tests, however, the palmar and pinch grip strength tests, and the atrophy of the thenar muscles proved more specific than the thumb abduction weakness test.

Only two studies reported the prevalence of CTS in the underlying population where they sampled their participants from [23, 29]. Prevalence is important when considering applying the results since the pretest probability is determined by the prevalence [18]. Settings with higher prevalence of CTS, such as electrodiagnosis labs and hand therapy clinics, likely have higher pre-test probability of CTS as compared to other screening contexts such as preemployment screening, where the prevalence would be expected to be very low. Except for two studies [8, 22], all of the included studies recruited their participants from neurophysiology/electrodiagnosis and hand clinics. To overcome the effect of CTS pretest probability, we ensured likelihood ratios were reported in this SR. Likelihood ratios report diagnostic accuracy independent from the prevalence of a condition in a given sample, and it is suggested that clinicians consider likelihood ratios in their clinical diagnosis decision making [18].

Administration methods of the sensory and motor tests for CTS diagnosis were very diverse across the included studies. For instance, the four studies assessing the diagnostic accuracy of vibrometry had four different methods in testing and different decision rules for positive test results. The same principle applies to the hand grip strength, hypoesthesia, pinch grip strength, and SWMFs tests. We advise clinicians and researchers should carefully consider their ability to replicate test methods (as reported in Table 2) when deciding on selecting a sensory or motor test to rule in/out CTS.

We did not exclude studies based on the choice for reference standard. Due to the lack of a gold standard for CTS diagnosis [4], and the nature of CTS as a clinical syndrome, there is no universal agreement on a reference standard. The most commonly used reference standard in the included studies was NCS. While some might consider NCS as the most definitive reference standard, it can have false positive and negative results [4]. That is, there can be abnormal results in patients who have no symptoms, and patients with persistent symptoms without positive NCS can show benefit following carpal tunnel release. Similar to our previous SR of the diagnostic accuracy of scales, questionnaires and hand symptom diagrams [12], the highest sensitivities and specificities occurred when taking other clinical tests and history as the reference standard [8, 22, 25, 26]. For instance, in the study done by Dale et al. 2011, among the three reference standards used, the highest diagnostic accuracy values occurred when taking Katz and Stirrat’s hand symptom diagram as the reference standard [8].

Study limitations and future directions

A limitation of the current study was that we did not conduct a meta-analysis. Due to the heterogeneity in the tests methods, reference standards, and decision rules for positive tests thresholds, meta-analysis was precluded, and we reported the results narratively. A second limitation that we would like to acknowledge is the possibility of a publication bias, because we only included published literature, not the gray literature. Our choice of only including published literature is justifiable by the argument that we intended to produce a synthesis of the available peer-reviewed evidence-based literature. As with any other review, we might have missed some studies. Although we designed the search strategy in consultation with a health science research librarian, it is possible that we did not capture all of the available evidence.

We recommend future studies produce evidence with the highest quality and the lowest risk of bias by adhering strictly to the established guidelines. Moreover, there is a great need for studies assessing the clinical triangulation process of combining several categories of clinical diagnostic tests.

Conclusion

The evidence reported in this study was obtained mostly from studies at risk of bias. Among the included studies none of the sensory or motor tests had consistently high diagnostic accuracy properties reported by high quality evidence. Confirming the value of a single sensory or motor test for CTS diagnosis is pending future robust research. From the evidence available at present, none of these methods appear promising in hel** to make a definitive diagnosis in the individual patient (though they are useful in demonstrating that both sensory and motor function are impaired by CTS when used in cohorts of patients in research studies).