Symptom Validity Assessment in Europe About 10 Years Ago

It is almost 10 years ago that Prof. González-Ordi from Complutense University, Madrid, asked the first author to contribute to a special issue on symptom validity assessment (SVA) to be published in the Spanish Elsevier journal Clínica y Salud. He did this at the third European Symposium on SVA held in Wuerzburg, Germany, in 2013, organized by the International Academy of Applied Neuropsychology (led by Gerhard Müller and Herbert König). The result of that request was a historical sketch on symptom validity assessment in Europe, beginning with the pioneering work of Rey (1941) and comprising state of the art reports contributed by colleagues from those four European nations that were most visible in SVA research at that time: the Netherlands, Spain, Great Britain, and Germany.

The current article is an attempt to update that text (Merten et al., 2013) without repeating the information given there (the text is available free of charge from the publisher; see References). In a fast-develo** field of research and assessment practice like SVA, the time frame of a decade is likely to bring about significant changes. The term symptom validity was historically developed, in the 1970s (e.g., Pankratz, 1979) but by 2013, it was no longer used as a superordinate concept. Larrabee (2012) had proposed to (verbally) differentiate between symptom validity tests (SVTs, from now on relating mostly to self-report validity measures, today also comprising interview methods) and performance validity tests (PVTs, relating to cognitive validity measures).

It was also at that time that the findings of a European survey on symptom validity testing (in the historical, superordinate sense of the term) were published (Dandachi-FitzGerald et al., 2013), following a previous survey from Great Britain (McCarter et al., 2009). Efforts to motivate as many national neuropsychological societies as possible to participate resulted in responses given by neuropsychologists from Denmark, Finland, Germany, Italy, the Netherlands, and Norway. Note should be taken that half of the national neuropsychological societies that had been contacted and asked to participate in the survey, either did not respond to repeated requests or signaled that, in their opinion, the study was not feasible in their countries.

About 10 years ago, there was considerable resistance against SVA even among forensically working neuropsychologists, on the background of an irrational conviction that a professional could easily tell apart genuine from manipulated symptom presentations without having to resort to special means and methods. Even more resistance was notable among many psychiatrists who apparently felt that their traditional intuitive approach of relying on subjective symptom report by patients (without thoroughly investigating its authenticity and possible significant response distortions) was threatened by the introduction of empirically based methods and a data-driven approach. These were methods many psychiatrists did not use and did not understand. This was clearly visible in both Germany and Switzerland (e.g., Dressing et al., 2011). For a more detailed account on forms of resistance against SVA as it had emerged in the 1990s and early 2000s, see Green and Merten (2013).

The whole dispute bears some resemblance with the old controversy of clinical versus statistical predictions (Meehl, 1954). The basic problem here is which kind of data is superior for arriving at valid diagnoses and prognoses. We have learned that, under some circumstances, statistical predictions are not automatically superior to clinical judgment. This appears to apply to some contexts, such as the classification of seizure types (e.g., Fargo et al., 2008), and judgments from persons with special clinical expertise (Ægisdóttir et al., 2006). The same may also apply to some forensic decision making contexts; a combination of statistical and clinical data may, in fact, turn out to be superior to either of them in isolation. In this vein, the new multidimensional malingering criteria for neuropsychological assessment (Sherman et al., 2020) also comprise specifiers for the clinical presentation of malingering. However, the practitioner should always bear in mind that human judges often overestimate their abilities (Kahneman et al., 2021).

In Britain, the use of the term “malingering” in court-ordered forensic expert reports continued to be largely taboo (see more detailed report below). The further development of the whiplash crisis in Britain (as described in the Merten et al., 2013, report) and the public perception of fraudulent symptom claims after motor vehicle accidents, were dealt with in a series of articles by Cartwright and his co-workers (e.g., Cartwright & Roach, 2015; Cartwright et al., 2019).

Around the turn of the millennium, there was an apparent delay in SVA of about 10 years in Europe, as compared to Northern American developments. For many psychological and medical professionals, even for some forensically working experts, the topic of feigned symptom presentations was largely taboo. The conclusion of the Merten et al. (2013) review was that parts of Central and Western Europe were about to reduce the delay in SVA research and practice significantly while in other parts of the continent (large parts, to be sure), no major published research was detectable. Yet, available European estimates of invalid responding and uncooperativeness in civil and social-law forensic contexts pointed at base rates similar to those obtained in North America (e.g., Allcott et al., 2014; Merten et al., 2020; Plohmann & Hurter, 2017).

It was the former member of the Executive Committee of the British Psychological Society, Division of Neuropsychology, Dr. Stuart Anderson, who formulated the idea of organizing and convening a European symposium on SVA (Anderson, 2010). This first meeting in Wuerzburg, Germany, was felt to be such an extraordinary success that subsequent symposia were held every other year. As a result, participants could keep track of the latest developments in this fast-evolving field and, most of all, meet and hear a selection of the most important experts; the list of contributors and keynote speakers reads like a Who is Who in symptom and performance validity research. Six conferences were held in Germany, The Netherlands, Great Britain, and Switzerland before the 2-year rhythm was unexpectedly interrupted by the COVID-19 pandemic.

The program list comprises invited contributors from at least 12 countries. Poster sessions were held at each conference, and poster blitz presentations were perceived to be a powerful way of alerting the audience to new developments, newly conceived tests, unpublished new studies, research projects in their planning phase, single-case studies, etc.

European Developments in Symptom Validity Assessment

The following reports embark on describing the state of the art in a number of European countries, those that were most visible in the SVA literature. For Great Britain, the Netherlands, Germany, and Spain, previous reports can be found in Merten et al. (2013), so it is updated from the earlier accounts only that were included here. Full reports were requested from all other contributing countries, that is, Austria, Italy, and Switzerland. Despite multiple efforts, no information could be obtained from a number of other countries, including France and the Scandinavian countries.

Germany

The awareness of possible significant response distortions in forensic assessments has grown further in the past 10 years, with an ongoing debate among psychiatrists about the use of SVTs and PVTs in patients with claimed mental disorders. Despite this debate, many psychiatrists began to use self-report validity measures, in particular the Structured Inventory of Malingered Symptomatology (SIMS; Smith & Burger, 1997), but the problem of correctly handling and interpreting results of this questionnaire and other instruments was visible in many expert reports. As with psychological testing in general, many non-psychologists continue to underestimate both the complexity of psychological assessment (symptom and performance validity testing included, of course) and the sound qualification needed to correctly use tests and interpret their results.

The discovery and publication of huge fraud networks targeting social security and social welfare schemes in Germany (Deutsche Rentenversicherung, 2017; Hoffmann, 2019) should have enabled even the most conservative of all critics of symptom validity research to correct their persistent belief that malingering and fraudulent disability claims were rare phenomena relevant for American, but not for European or German social realities. But old generals and irrational convictions never die, so resistance against SVA will most probably only fade away with time.

Most visible among newly developed validity measures and published research were the Beschwerdenvalidierungstest (BEVA; Walter et al., 2016) and the Self-Report Symptom Inventory (SRSI; Merten et al., 2016). At about the same time, German adaptations of the Structured Interview of Reported Symptoms–2 (SIRS-2; Schmidt et al., 2019) and the Inventory of Problems–29 (IOP-29; Viglione & Giromini, 2020) were tested and made available to German-language users. Among PVTs, the Groningen Effort Test (GET; Fuermaier et al., 2017) was also published in the German language.

Publications on empirical studies performed in Germany were diverse and appeared to concentrate on the use of validity measures in clinical and rehabilitation contexts (e.g., Kobelt-Pönicke et al., 2020; Merten et al., 2020) as well as in forensic patients with psychiatric diagnoses (e.g., Stevens et al., 2018).

Great Britain

The position described for Great Britain (GB) in the 2013 review paper was that the majority of research focused on PVTs, and most studies utilized non-forensic clinical populations. There has been a paucity of test validation research since this time, but again the few studies that have been conducted have used non-forensic clinical populations (Hampson et al., 2014; Suesse et al., 2015).

It would seem that clinicians in the GB continue to adopt a softer approach to SVA compared with North America. They are more reluctant to use PVTs and SVTs to identify malingering as reflected in the GB research, which has a paucity of studies using known-group designs with “malingering” groups. Clinicians in GB may be more skeptical about formulating opinions or beliefs about “malingering” due to malingering being a decision for the Courts to decide and not a clinical decision for an expert witness. There is also evidence that GB clinicians still prefer the use of the term effort test rather than PVT (e.g., Hampson et al., 2014; McGuire et al., 2019).

McWhirter et al. (2020) reviewed PVT failure in clinical populations. The authors hypothesized that PVTs measure a range of factors including attentional deficit. Larrabee et al. (2020) criticized this review, and Mc Whirter et al. also responded to their criticisms. In this exchange, a difference between the GB and US use of terminology and opinion about PVTs was highlighted, particularly with regards to the term effort. Larrabee stated that the term effort tests are no longer in use in the US, in part because PVTs require little effort to perform so that people experiencing significant cognitive impairment can pass them. They highlight a problem with using this term and state “continuing to refer to PVTs as “effort tests” allows mischaracterization of PVTs as sensitive attentional tasks affected by variable “effort” rather than measures of performance validity that are failed due to invalid test performance. There was some more discussion about the proper use of terminology in Britain, which can be downloaded from the journal website.

In the previous review, Merten et al. (2013) noted that neuropsychologists remained skeptical about the use of PVTs, although the majority of neuropsychologists were using them in medico-legal settings (McCarter et al., 2009). The trend continues, and they are still not widely adopted in clinical settings (Suesse et al., 2015).

Since the last review, no further detailed studies have been published which review whether neuropsychologists’ practice has changed in GB. However, there have been other reviews which have outlined the frequency in which psychologists and other professionals use SVTs/PVTs. Cartwright et al. (2019) found that only 20% of expert witness psychologists used SVTs in non-cognitive psychological assessments. Allcott et al. (2014) surveyed the practices of a range of GB expert witnesses in the fields of neurology, neuropsychiatry, neurology, orthopedics, neuropsychology, clinical psychology, and care. They found 49% of expert witnesses evaluated symptom validity by making judgments about whether there were marked inconsistencies between complaints and medical history. Thirty-two percent assessed it by determining whether complaints were disproportionate to the severity of the injury. Forty-four percent of respondents did not routinely use any tests/procedures for symptom validation. Half of those who routinely use some form of symptom validity testing did not specify any peer-reviewed sources that were useful in their practice (i.e., 55% of respondents did not reply when asked to specify peer-reviewed articles or books that they found useful on the subject). In response to the use of these methods, the expert witnesses made comments such as the “validity of such instruments remains questionable;” “I am unaware of any reliable tests or procedures that are of help;” “I have found personal experience more useful than any of the above (peer-reviewed publications).” The review concluded that the overall impression is that most experts, including very seasoned experts, remain skeptical about the use of SVTs.

GB research shows that there is a mixed acceptance that malingering or non-credible presentations are prevalent in litigant populations. Cartwright et al. (2019) found that only 9.9% of a group of 37 participating GB psychologists who conducted medicolegal assessments believed that the claimants were malingered. Allcott et al. (2014) described a substantial variation in medicolegal and psychological experts’ prevalence estimates for exaggerated or feigned health complaints. A clear majority of the respondents found that most medico-legal cases (> 75%) were presented as genuine cases, but exact numbers were not given.

In sum, the acceptance of SVA appears to be limited. GB research predominately focuses on clinical populations, and clinicians tend to be resistant to using them to detect malingering, preferring a softer approach.

Spain

There is a clear continuity of the panorama described for Spain in Merten et al. (2013), both in the current lines of research and in the adaptation and creation of instruments, as well as in the most relevant challenges that remain. Research is ongoing in the fields of forensics (e.g., Fariña et al., 2014), neuropsychology (e.g., Daugherty et al., 2020), medico-legal (e.g., Capilla Ramírez et al., 2014), and military (e.g., García Silgo, 2019). The most prevalent field both in terms of research and application is forensic assessment, in particular the assessment of sequelae of psychological injuries subsequent to traumatic events (like gender-based violence, Marín-Torices et al., 2018, or of traffic accidents, Puente-López et al., 2021).

Spanish adaptations of a variety of international validity tests are available (MMPI-2, MMPI-A, MMPI-2-RF, Personality Assessment Inventory [PAI, PAI-A], SIMS, Test of Memory Malingering [TOMM]), and continue to be used in research studies in Spain (e.g., López-Miquel & Pujol-Robinat, 2020; Vilar-López et al., 2021). The Spanish adaptation of the MMPI-A-RF has recently been published, and the MMPI-3 publication is planned for 2023. Specific malingering scales for forensic assessment of posttraumatic stress disorder were developed in Spain, such as the Trauma Impact Questionnaire (CIT; Crespo et al., 2020) and the Posttraumatic Stress Disorder Symptom Severity Scale: Forensic version (EGS-F; Echeburúa et al., 2017). Performance validity tests have also been created in the field of neuropsychology, like the extended version of the Coin-in-the-Hand Test, developed in Spain and later validated at a multicultural level (Daugherty et al., 2021).

Despite the availability of such research and instruments, further basic and applied research is necessary. To date, there appear to be more open questions than answers. Similar to the scenario depicted in 2013, it is still necessary to define adequate protocols for the systematic investigation of possible malingering based on consensus across the different fields of application and areas of assessment. Unfortunately, this goal appears to be quite distant. Furthermore, similar to the situation depicted above for Germany, the complexity of validity assessment is still underestimated and downplayed; the search for a “magic wand” of simple and fast solutions persists. In this sense, there continues to be an inadequate use of screening tools (like the SIMS); such instruments are partly used for diagnostic purposes with a poor understanding of their scope and limitations. The medicine field is facing a special challenge with regard to the assessment of temporary disability due to mental health disorders. This area certainly requires further research and elaboration, more profound professional specialization, and the improvement of assessment protocols.

The Netherlands

In the Netherlands, SVA has attracted steady research attention although the field was hardly supported by National or European grant organizations. Since 2013, 6 doctoral theses on symptom validity have been published (Boskovic, 2019; Dandachi-FitzGerald, 2017; Meyer, 2020; Niesten, 2019; Van der Heide, 2021; Van Impelen, 2018). [All doctoral theses are accessible, see reference list]. In addition to deception detection, Dutch research on SVA is characterized by conceptual studies (e.g., Merckelbach et al., 2019). Experimental studies have examined whether moral primes (Niesten et al., 2017) and feedback (Merckelbach et al., 2015) can deter symptom overreporting tendencies. Also, studies have looked into the consequences of symptom and performance invalidity (e.g., Merckelbach et al., 2014a; Roor et al., 2021).

Two new performance validity measures have been developed; the Groningen Effort Test, an attention-based performance validity test (Fuermaier et al., 2017), and the Visual Association Test–Extended, a memory test with an embedded performance validity index (Meyer et al., 2017). Additionally, Dutch versions of the Assessment of Depression Inventory (ADI-NL; Mogge & LePage, 2004; Van Leeuwen & de Jonghe, 2018) and the Schretlen Malingering Scale (Merckelbach, Otgaar et al., 2014b; Schretlen et al., 1992) have been made available.

In the professional field, the issue of validity assessment has raised increasing interest among insurance and company doctors, as well as among lawyers, especially those specialized in personal injury claims. Like in other countries, the new nomenclature of distinguishing performance and symptom validity tests (measuring underperformance on cognitive tests and overreporting of symptoms, respectively) has been adopted. Terms like “malingering test” and “effort test” are less commonly used, and effort in the context of performance validity is more clearly understood as “applying effort to perform well.” The revised guideline for forensic neuropsychological assessments now explicitly states that “in every forensic neuropsychological assessment, the evaluation of symptom and performance validity must be psychometrically substantiated” (Nederlands Instituut voor Psychologen, sectie Neuropsychologie, 2016, p.10, quotation translated). This guideline further stipulates that a minimum of two freestanding validity tests should be administered, and that performance and symptom validity should be separately assessed.

In contrast, the idea that clinical impression suffices to assess the validity of self-reported symptoms is still commonly voiced among forensic psychiatrists. According to their guideline, psychiatrists may consider the use of specific instruments as soon as they have doubts about symptom validity based on their clinical impression (Nederlandse Vereniging voor Psychiatrie, 2012). This primacy of clinical judgment flies in the face of what is now becoming an impressive corpus of knowledge (e.g., Dandachi-FitzGerald et al., 2017; Rosen & Phillips, 2004; Zubera et al., 2015). Up until now, studies on how frequently validity tests are used in forensic assessments in the Netherlands are lacking.

To examine the role of symptom validity tests in a Dutch court, Merckelbach and Dandachi-FitzGerald (2021) searched the public database on court decisions with the terms “feigning,” “simulation,” “malingering,” and “exaggeration,” and selected the ten most recent decisions for each term. In 22 of the 36 cases (61%), validity tests were mentioned; showcasing that by now these tests have acquired a fixed position in the legal system. Still, a close analysis of these legal cases revealed that there is considerable room for improvement, specifically when it comes to interpreting the outcomes of validity tests. For example, in one case, poor performance validity was explained by the psychiatrist as “unconscious exaggeration caused by a conversion disorder.” In yet another case, the neuropsychologist concluded that the failures on two freestanding performance validity tests could be explained away by cognitive deficits due to mild TBI. In the first case, the dubious explanation was accepted by the court. In the second case, the court rightly ruled that the expert opinion on validity test failure was incoherent because symptom validity test failure casts doubts on the possibility to establish the presence of cognitive deficits with any acceptable degree of certainty. These cases illustrate the importance of both experts and judges being well informed about SVA; there is still much work to do here.

To conclude, the state-of-the-art of SVA in the Netherlands appears to be at the forefront of Europe, at least as far as it concerns neuropsychological assessments. Nonetheless, controversies remain and pertain mostly to the interpretation of validity test outcomes. Experts struggle with how to interpret a patient’s symptom presentation when this patient passes some validity tests but fails others. Also, there still seems to be an inclination to ascribe validity test failure to psychopathology or somatic symptoms such as fatigue and pain, highlighting that problematic beliefs about SVA are circling around.

Austria

The Austrian Federal Ministry of Health has published guidelines for the preparation of clinical-psychological and health-psychological data and reports. According to the Psychologists ‘ Act, these guidelines are binding for psychologists (Bundesministerium für Soziales, Gesundheit, Pflege, und Konsumentenschutz 2020). The problem of symptom and performance validity is not specifically addressed in the current guidelines. However, this issue has repeatedly been taken up to varying degrees of detail in recent publications by Austrian authors (Lehrner et al., 2015, 2021; Lettner, 2019; Strubreither, 2021).

The range of advanced training courses addressing SVA has improved significantly in Austria over the past few years. Workshops on the subject of “symptom and performance validation in neuropsychological reports” have been organized and are still being offered by the Austrian Neuropsychological Association (GNPÖ). The content of these workshops covers the interpretation and misinterpretation of results, the possibilities and limits of the use of a specific test to detect malingering, ethical questions raised by the use of SVTs and PVTs, and the presentation and discussion of expert reports. Also, the curriculum for legal psychology of the Professional Association of Austrian Psychologists (BÖP) offers a training module (Module 2) comprising special topics such as symptom exaggeration, malingering, and symptom validation. In May 2021, an online advanced training course on SVA in clinical psychological assessment was held via Zoom as part of the advanced training of the clinical psychology section of the BÖP. It attracted approximately 500 participants.

A questionnaire on SVA in psychological assessment was sent out by the BÖP to all participants. It was also sent to all 5054 members of the clinical psychology section of the BÖP, as well as 87 expert psychologists listed as court experts of the Federal Ministry of Justice. A total of 99 submitted data sets could be analyzed. These data sets stemmed from 17 psychologists listed as experts at the regional courts and 82 members of the clinical psychology section. The two groups differed in the reported frequency of validity test use. Sixteen percent of the section members and 12% of the court experts reported that they used validity tests in more than 95% of their clinical assessment cases. In an independent psychological examination, 25% of the section members and 29% of the experts stated that they used SVTs and PVTs in more than 95% of their cases. A 0% of court experts reported that they never used validity tests. In contrast, 28% of section members reported that they never used PVTs/SVTs for clinical cases, and 13% reported that they never used PVTs/SVTs in court-ordered examinations. The frequency of PVT/SVT use is similar to that which is reported in other European countries (e.g., Dandachi-FitzGerald et al., 2013), but section members were not using PVTs/SVTs as frequently as psychologists in forensic contexts did.

It is noteworthy that there is a prominent European publisher and distributor of computerized psychological assessment, Schuhfried, which is based in Austria. Among others, they published the Groningen Effort Test (Fuermaier et al., 2017).

Among research activities in the field of SVA, a clinical study by Bodner et al. (2019) investigated the validity of several PVTs (TOMM, Fifteen-Item Test, Reliable Digit Span, and Reliable Spatial Span) in the context of language disorders (aphasia). At the Medical University of Vienna, Czornik et al. (2021) evaluated a range of tests (e.g., Word Memory Test, the SIMS, and SRSI) using a sample of individuals from a memory out-patient clinic. A further study by Czornik et al. (2022, in this issue) investigated a reaction-time-based embedded PVT in a sample of civil forensic patients.

Symptom and performance validation continues to be discussed controversially in Austria. With no strict relevant guidelines available, individual professionals approach this topic differently. Yet, with more widespread knowledge about SVA, the use of PVTs and SVTs both in clinical and in forensic contexts is increasing.

Italy

A survey of SVA practices and beliefs of Italian psychologists was conducted recently by Giromini et al. (under review). According to that survey, the majority of Italian practitioners (> 60%) are prone to use SVTs and/or PVTs when they believe that their evaluee could have an interest in producing false or grossly exaggerated physical or psychological symptoms. However, only 13.2% reported using one or more stand-alone SVTs or PVTs routinely in their assessments. Accordingly, Giromini and colleagues concluded that, albeit Italian psychologists do not always question the credibility of presented symptoms, when they do so, they are relatively prone to use SVTs and/or PVTs to assist their decision-making.

With regard to research, a simple literature search found 48 articles potentially focused on SVA in Italy.Footnote 1 Of these 48, eleven were not directly relevant to our project, as they focused either on underreporting (e.g., Pompili et al., 2003; Roma et al., 2018) or on other loosely related issues (e.g., on the difficulties in completing a literature review on attention deficit hyperactivity disorder in adulthood, given the influence of multiple factors, including malingering; Mucci et al., 2018). Of the remaining 37 articles, as many as 26 (70%) were published during the past 5 years alone (i.e., between 2016 and 2021), thus highlighting an ongoing growing interest in SVA within the Italian context.

These most recent research efforts primarily focused on three major topics. First, a few Italian researchers investigated the potential usefulness of various, modern technological advancements. For instance, Orrù et al. (2021) and Pace et al. (2019) applied machine-learning techniques to develop a shorter version of the SIMS (Orrù) and to discriminate credible from noncredible presentations using the b test (Pace). Monaro et al. (2018) analyzed mouse movements of individuals either feigning depression or responding honestly while engaged in a double-choice computerized task, so as to develop a machine learning-based algorithm aimed at detecting feigned depression. Zago et al. (2019) implemented facial thermography and kinematic analyses, in addition to symptom validity testing, in an effort to help detection of feigned amnesia after committing a crime.

A second emerging research area in Italy concerns the investigation of the effectiveness of several SVTs and PVTs. In particular, numerous recent studies examined the psychometric properties of the Italian IOP-29, reporting on its concurrent (Giromini et al., 2018), incremental (Giromini et al., 2019), and ecological (Roma et al., 2020) validity, on its applicability to multiple symptom presentations (Giromini et al., 2020), and on the equivalence of its online and paper-and-pencil formats (Giromini et al., 2021). Additionally, some authors also investigated the effectiveness of the SIMS and MMPI-2 in detecting noncredible presentations (e.g., Mazza et al., 2019), and a recently published article described the development and initial validation of the IOP-M, a new, add-on, PVT module designed to be used in combination with the IOP-29 (see also Banovic et al., 2021; Carvalho et al., 2021; Gegner et al., 2021).

Lastly, a third research area that deserves mention here concerns the detection of feigned crime-related amnesia. An Italian study investigated whether feigning amnesia for a mock crime has an impact on an individual’s ability to later recall the actual details of the mock crime (Mangiulli et al., 2018).

It should be noted, however, that Italian research on SVA actually goes beyond these three research areas. For instance, some relatively recent Italian publications addressed the feigning of specific problems such as second-language deficit subsequent to mild traumatic brain injury (Zago et al., 2013) or elaborated on malingering-related conditions such as the factitious disorder (Poloni et al., 2019) or Munchausen syndrome (Callegari et al., 2006). In fact, as noted above, Italian research on SVA-related topics is accumulating rapidly, and one may anticipate that this trend will likely continue during the coming years.

Switzerland

The situation of SVA in Switzerland in recent years was characterized by an increasing acceptance of the fact that it is a useful and necessary tool to distinguish valid symptoms from invalid (exaggerated or feigned) complaints. In 2008, the Swiss Federal Social Insurance Office commissioned and published a study (Kool et al., 2008) with the aim of providing a systematic review of the literature on SVA to promote the development and adoption of medico-legal standards among professionals. Also, in the guidelines for medico-legal neuropsychological assessment of the Swiss Association of Neuropsychologists (SVNP, 2011), testing of effort and determinations about the consistency of test results were described as integral parts of a neuropsychological examination in a medico-legal context. This was confirmed to be necessary for the legal literature (Kieser, 2012); accordingly, Swiss courts increasingly emphasized the importance of SVA in relevant judicial decisions. Plohmann and Hurter (2017) published the first study to examine the prevalence of inadequate effort and malingered neurocognitive dysfunctions in medico-legal contexts in Switzerland. The authors reported a prevalence of probable or definite malingered neurocognitive dysfunction in medico-legal contexts ranging from 27.5 to 34.3%, depending upon which cut score was used for Reliable Digit Span. Within this group, about one-tenth (10.3–12.8%) presented with below-chance response patterns and qualified as cases of definite malingering. The prevalence rates in Switzerland were in line with those obtained in other countries (e.g., Mittenberg et al., 2002) and demonstrated the necessity of performing a careful SVA in medico-legal evaluations. The fifth European Conference on Symptom Validity Assessment was held in Basel in 2017. With special emphasis on psychosomatic, psychiatric, and pain disorders, it was organized under the auspices of the Swiss Association of Neuropsychologists.

Current efforts are directed at training experienced and young neuropsychologists in the use and interpretation of neuropsychological tests, taking into account adequate SVA. Postgraduate training for what is called “Eidgenössisch anerkannter Neuropsychologietitel (EAN)” and for a “Master of Advanced Studies in Neuropsychology (MAS)” was established at the University of Zurich in 2020. A specific module is dedicated to SVA. Moreover, SVA is also one central topic in several modules of the Swiss Insurance Medicine (SIM) assessor training leading to the qualification as a “certified neuropsychological assessor SIM.” A “SIM specialist group in neuropsychology” was founded in 2020. In the coming years, further efforts are needed to establish high qualitative standards in SVA both in medico-legal and in clinical fields.

Major Challenges for Future Developments

It may be stated that research into and forensic practice of symptom and performance validity testing in Europe, seen as a whole, has developed at a level comparable to that known from the U.S. and Canada. Yet, a closer look reveals a continued gross heterogeneity across the continent. Another comprehensive survey on SVT/PVT use, following the Dandachi-FitzGerald et al. (2013) study, with the inclusion of as many national neuropsychological societies as possible appears to be indicated for the years to come, in order to tap the state of the art that will be arrived at in the course of the 2020s. In some countries, forensic and partly clinical practice underwent a significant change with the more widespread use of validity measures, but there is little or no information about other parts of the continent.

In comparison to the situation about 10 years ago, not only a significant body of empirical studies has been accumulated, but also conceptual and practical aspects of SVA underwent significant modifications. Challenges arise for practitioners to always keep abreast of methodological and conceptual developments at the highest level of current knowledge. As described in some of the national reports above, the conceptual shift from “malingering research” and “malingering detection” to “validity research” and “validity assessment” is not readily embraced by all researchers and all practitioners, and outstanding position papers like Sherman et al. (2020) and Sweet et al. (2021) will not be absorbed quickly and smoothly in all corners of the continent. Between overt neurological disease/brain damage and frank malingering, there are many other conditions, including the exaggeration of minor neurological injury and psychiatric conditions such as factitious disorder, somatoform conditions, and what is now called functional neurological symptom disorder (cf. Stone & Sharpe, 2020, for a recent appraisal of the latter). Thus, related to conceptual developments, it is necessary to further explore questions regarding what validity failures actually mean in different contexts and what the legal and treatment implications are. In interdisciplinary settings, this may require exploration by a range of disciplines. This is particularly relevant for psychiatric conditions such as post-traumatic stress, somatoform disorders, and pain-related disabilities (e.g., Greve et al., 2012; Howe, 2012; Merten & Merckelbach, 2013). In some conditions where there is an overlap between diagnostic categories (e.g., somatoform and conversion disorders, factitious disorder, and malingering), it can be a problem pigeonholing patients into one of them as individuals may equally fit into more than one diagnostic category (e.g., Merten & Merckelbach, 2013; Sherman et al., 2020). Different conditions may co-occur, with no clear boundaries, but smooth transitions between them.

On the methodological level, validity research will have to move further away from the easy-to-do analog studies into real-world settings, in particular with well-defined clinical patient groups. However, the primary challenge in such settings is that it will be difficult, if not impossible, to reliably tell apart true-positive from false-positive SVT or PVT results in some constellations (e.g., Dandachi-FitzGerald et al., 2016; Merten et al., 2020). On the level of test development, professionals in only a few European countries with non-English national languages appear to dispose upon a sufficient number of well-validated SVTs and PVTs, most of them adaptations of North American tests. For some nations, the availability of tests is a major problem (e.g., Janaviciute et al., 2021). Also, equivalence studies comparing different language versions are rare. A focus on European tests (e.g., Meyer et al., 2017; Walter et al., 2016) will certainly not solve the basic problems posed by the diverse range of languages and cultures that are present across the continent. The continuing influx of immigrants from Asia and Africa is another factor aggravating intercultural problems of validity assessment. There is a clear need for multi-language versions of common validity measures. On the level of test administration, modernization and recent restrictions due to the COVID-19 pandemic have fostered online presentation modes of tests (remote assessment), with yet unknown consequences for the interpretation/interpretability of SVTs and PVTs outside their standard conditions of use. The validity of these tests has not yet been systematically researched outside of the normal use. Also, before psychologists can use a test remotely, the copyright holder of the instrument must agree to their test being used in this manner. To our knowledge, only one European study has addressed the question of paper–pencil versus online presentation to date (Giromini et al., 2021). It is, therefore, necessary to conduct more systematic research into these problems.

Another special challenge to continue in the future is to further educate practitioners to correctly use and interpret the results of validity testing, in particular, to resist temptations to explain away uncomfortable results of validity assessment (e.g., Dandachi-FitzGerald et al., 2015; Merten, 2017). In most countries, proper in-depth routine training in methods of SVA is often omitted both for neuropsychologists and for forensic psychologists.

Professional guidelines for forensic assessment and for independent medical and psychological evaluations appear to include increasingly statements about symptom and performance validation, but special guidelines are rare. Those published in Britain (McMillan et al., 2009) have recently been updated (Moore et al., 2021). Another important issue is research and guidelines on how to handle clinical patients who produce invalid test profiles or report noncredible symptoms (e.g., Carone & Bush, 2018; Martin & Schroeder, 2021).

A larger number of open questions and important problems can easily be identified; consequently, with the continued relevance of the topic, research activities are likely not to slow down in the foreseeable future. The study of both professionals’ and laypersons’ attitudes and expectations with regard to factitious symptom presentations, malingering, fraudulent health claims, etc. will be another problem of interest, not least with respect to social and intercultural factors (e.g., Cartwright & Roach, 2015; Dandachi-FitzGerald et al., 2020; Merten & Giger, 2018; Schlicht & Merten, 2014). Also, embedded PVTs are clearly underresearched in Europe, contrary to their apparent significance in validity research and practice. Similarly, the use of multiple validity measures and their consequences for diagnostic decision-making is underresearched. In contrast to research activities in other parts of the world, validity assessment with personality inventories, in particular the MMPI family and the Personality Assessment Inventory, appear to play a minor role in Europe (with some exceptions, e.g., García Silgo, 2019; Giromini et al., 2019; Vossler-Thies et al., 2013). Remote assessment and its consequences for validity is certainly another topic of interest, in particular, if the COVID-19 crisis continues to affect professional activities as much as it did in 2020 and 2021 (Corey & Ben-Porath, 2020). In another 10 years’ time, we will certainly know more about these topics, and others will have emerged not even mentioned in this review.