A diagnosis of attention-deficit/hyperactivity disorder (ADHD) can have tangible benefits for college students. One primary benefit is access to psychostimulants, which can be used to increase attention and inhibition, be misused as a study aid or recreational drug, or be diverted to others (Benson et al., 2015; Ramachandran et al., 2020). Students diagnosed with ADHD may also be eligible for academic accommodations and modifications such as additional time on high-stakes exams, access to notes or formulas during tests, and substitutions for curricular requirements in math or foreign language (Harrison et al., 2022; Lovett, 2021). In some cases, students with ADHD and other disabilities are eligible for scholarships, bursary funds, and loan forgiveness programs (Harrison, 2017; Harrison & McCarron, 2023).

Although less often acknowledged, the secondary benefits of an ADHD diagnosis are no less important. Completing college-level coursework can be difficult for all students, regardless of their diagnostic status. Students who experience academic challenges may interpret their struggles as signs of ADHD rather than as difficulties experienced by most of their classmates (Suhr & Wei, 2017). Students who attribute their struggles to ADHD may selectively attend to behaviors that are consistent with the disorder, such as needing to spend more time and effort than their peers when studying, taking tests, or writing essays. At the same time, they may ignore signs that they do not have a disability, such as the fact that their academic achievement exceeds that of most people in the general population. This illness identity protects their self-esteem by allowing them to attribute their academic difficulties to a diagnostic label rather than to other factors such as poor sleep hygiene, substance use problems, low motivation, or underdeveloped study skills (Suhr & Johnson, 2022).

College students with and without ADHD recognize these benefits. Meta-analysis indicates that 17% of college students without ADHD misuse stimulant medication. The primary reason for misuse is to improve academic performance (Benson et al., 2015). Between 56 and 59% of college students who have been prescribed psychostimulants have been asked by a classmate or friend to share their medication (Rabiner et al., 2009; Ramachandran et al., 2020). The prevalence of diversion among college students prescribed stimulant medication ranges from 18 to 36% (Ramachandran et al., 2020; Sepúlveda et al., 2011). Lewandowski et al., (2014) found that 88% of students with disabilities and 87% of students without disabilities believe that additional exam time would improve their grades. Most students with and without ADHD also believe that certain instructional accommodations (e.g., access to a computer/tablet during class, recorded lectures) and curricular modifications (e.g., course substitutions, preferential registration) would also improve their academic functioning (Weis & Stromsoe DeLorenzo, 2024).

These tangible and psychological benefits incentivize students to exaggerate ADHD symptoms during diagnostic evaluations (Harrison, 2022). Moreover, adults can provide noncredible data for other reasons such as cognitive impairments, low motivation, or to express a desire for help (Hirsch et al., 2022; Young, 2022). The base rate for noncredible responding among adults referred for ADHD evaluations ranges from 9 to 27% when failure on multiple response validity tests is used as the criterion to indicate noncredible data (Abramson et al., 2023; Finley et al., 2024; Marshall et al., 2016; Phillips et al., 2023). For example, Ovsiew et al., (2023) found that 22% of adults consecutively referred for ADHD evaluations at an American medical center failed at least one symptom validity test (SVT), whereas 16% failed at least two performance validity tests (PVTs) during the evaluation. Hirsch et al., (2022) examined two samples of adults referred for ADHD evaluations at a German outpatient clinic. Approximately 27.3% of participants failed multiple PVTs, whereas 9.7% failed an SVT and multiple PVTs. Two recent studies have also estimated the frequency of noncredible responding among college students participating in disability evaluations. Harrison et al. (2023) found that 33.6% failed at least one SVT, whereas 13.6% failed at least two SVTs. Similarly, Mascarenhas et al., (2023) found that 21.3% of students failed at least one PVT and 9.4% failed at least two PVTs during their evaluations. These findings are generally consistent with the results of a recent survey of neuropsychologists who estimated that 20% of adults participating in ADHD evaluations provide invalid data (Martin & Schroeder, 2020).

Students without disabilities are also able to convincingly feign ADHD. Harrison et al., (2021) reviewed the results of simulation studies in which students with and without ADHD completed self-report measures. Students asked to simulate ADHD reported elevated symptoms comparable to the reports of students with actual ADHD. In aggregate, simulators often earned higher scores than students with actual ADHD, especially on scales assessing hyperactive-impulsive symptoms. At the individual level, however, simulators’ ratings were usually indistinguishable from the ratings of students with the disorder. Moreover, students can convincingly report ADHD symptoms without coaching or preparation (Edmundson et al., 2017; Fuermaier et al., 2016; Rios & Morey, 2013; Tucha et al., 2009).

The American Academy of Clinical Neuropsychology warns clinicians that self-report measures should not be used in isolation to make diagnostic decisions (Sweet et al., 2021). Instead, psychologists should administer multiple symptom and performance validity indicators as part of the assessment process and integrate their results with objective information about the examinee’s history and current functioning. Several studies have shown low correspondence between SVTs and PVTs, suggesting that these instruments reflect dissociable constructs that should be measured independently (Hirsch & Christiansen, 2018; Hirsch et al., 2022; Ovsiew et al., 2023). Ideally, self-report data should be corroborated by domain-specific SVTs that appear to measure the same construct as the self-report measure. Harrison et al., (2021) demonstrated the importance of SVTs in college student ADHD evaluations. Students who failed at least one SVT earned significantly higher scores on the Conner’s Adult ADHD Rating Scales (CAARS) than students who passed all SVTs. SVT failure was also domain specific. Students who failed an ADHD-specific SVT earned the highest ratings on self-report measures of ADHD, but they did not show higher scores on self-report measures of other mental health problems. In contrast, students who failed SVTs assessing other mental health problems, but who passed ADHD-specific SVTs, did not show elevations on ADHD rating scales. Ovsiew et al., (2023) reported similar findings. Patients who failed an ADHD-specific SVT endorsed significantly higher ADHD symptoms than those who passed the SVT, whereas there were negligible to small differences in the self-reported ADHD symptoms between patients who passed or failed PVTs.

Unfortunately, there is little evidence that psychologists who conduct adult ADHD evaluations routinely administer response validity tests of any kind. Nelson et al., (2019) reviewed the psychological assessment reports of college students seeking accommodations for ADHD-related limitations. Only 1% of reports mentioned the administration of a validity test. Similarly, Weis et al., (2019) reviewed the psychological evaluation reports of students already diagnosed with ADHD and receiving accommodations. Only 3% of psychologists included a validity test in their assessment. One barrier to symptom validity testing in adult ADHD evaluations is the lack of concern that many diagnosticians show regarding the possibility of symptom exaggeration (Ranseen & Allen, 2019). Another obstacle is the mistaken belief among clinicians who conduct disability evaluations that they can identify noncredible responding without the use of validity tests (Harrison et al., 2013). Although researchers have repeatedly emphasized the importance of adopting a forensic approach to disability evaluations, there still appears to be a lack of progress in the widespread adoption of these tests (Lovett & Davis, 2017; Lovett & Harrison, 2019; Suhr et al., 2021).

On a more practical level, there is not yet agreement as to which SVTs best differentiate students who exaggerate symptoms from students with genuine ADHD. Several SVTs assess the exaggeration of psychological symptoms or cognitive problems (e.g., Merten et al., 2022; Slick et al., 1997), but they are not specific to ADHD. The Clinical Assessment of Attention Deficit-Adult contains embedded scales designed to detect infrequently endorsed symptoms and negative response bias, respectively; however, these scales are discordant with other symptom and performance validity tests (Leib et al., 2022) and may not discriminate patients with valid and invalid neuropsychological test profiles (Finley et al., 2024). Other SVTs are embedded into existing tests of personality, such as the Minnesota Multiphasic Personality Inventory (MMPI; e.g., Robinson & Rogers, 2018; Young & Gross, 2011) or the Personality Assessment Inventory (PAI; e.g., Rios & Morey, 2013; Smith et al., 2017). These lengthy and expensive tests are typically not part of an adult ADHD evaluation and many clinicians who diagnose ADHD, especially physicians and other healthcare providers, do not have access to them. Several brief, ADHD-specific SVTs have been developed, but data supporting their validity are often limited to the samples from which they were developed. They may not perform as well in independent samples. Moreover, no study has compared these SVTs using the same sample or examined how the results of multiple SVTs might be combined to render judgements regarding symptom exaggeration.

The purpose of our study was to examine the ability of several brief, ADHD-specific symptom validity tests to identify noncredible responding in college students: the CAARS elevated scale scores and Inconsistency Index (Conners et al., 1999); the CAARS Infrequency Index (CII; Suhr et al., 2011); the Dissociative Experiences Scale (DES) and Exaggeration Index (EI; Harrison & Armstrong, 2016); and the Subtle ADHD Malingering Screener (SAMS; Ramachandran et al., 2019). We selected these tests because they are domain-specific, free, and accessible to nearly all clinicians, easy to administer and score, and have been shown to identify noncredible responding in college students.

We investigated the utility of these SVTs using a simulation research design study (Rogers, 2018). Specifically, we compared the CAARS ADHD symptom scores and SVT scores for three groups of undergraduates: students with self-reported ADHD, students without ADHD and other disabilities, and students without ADHD and other disabilities who simulated ADHD. First, we wanted to replicate the results of previous studies showing that college students can convincingly feign ADHD without coaching. Second, we expected these SVTs to differentiate groups using a large, independent sample. This was especially important for the DES, EI, and SAMS since the cut scores for these measures were identified using the same samples from which the tests were originally developed. Third, we wanted to extend our knowledge of these SVTs by examining their agreement in identifying noncredible respondents. Moderate to high inter-classification agreement would support the concurrent validity of these instruments as indicators of symptom exaggeration. Finally, we wanted to extend previous research by identifying which SVT, or combination of tests, most effectively identified simulators while minimizing positive cases among honest participants in the control and ADHD conditions. If successful, our findings might facilitate the use of SVTs among clinicians who assess college students for this condition.

Method

Participants

Participants were recruited through Prolific, an online crowdsourcing platform for behavioral research. Prolific uses a multi-step recruitment process that includes identity verification, digital fingerprinting technology, and IP address checks to ensure data integrity. When volunteering for Prolific, panelists complete a battery of demographic and behavioral health questionnaires that include age, gender, education, diagnostic status, medication use, and mental health history. Onboarding checks are also performed when participants complete these measures to assess attention, comprehension, and consistency in responding. Only participants who meet inclusionary criteria are recruited for panels. Participants are paid $15/h, the rate recommended for fair and ethical compensation. Prolific panels yield higher quality data than participants recruited from traditional undergraduate participant pools or similar online platforms (Douglas et al., 2023).

Participants were 763 degree-seeking undergraduates enrolled in US colleges or universities. Participants’ ages ranged from 18 to 74 years (M = 23.36, Mdn = 22.00, SD = 5.69). Participants were identified as men (47.3%), women (47.8%), or nonbinary gender identity (4.8%). Racial and ethnic identities included White (63.4%), Asian American (21.4%), Black or African American (14.4%), Hispanic or Latino (14.2%), American Indian or Alaska Native (1.0%), Native Alaskan or Pacific Islander (0.3%), and other (5.4%). Participants could select multiple racial and ethnic categories. Undergraduate year in school included 1st (11.1%), 2nd (19.5%), 3rd (31.5%), 4th (31.2%), and 5th (6.7%).

Our study used a between-subject simulation design with three conditions (Rogers, 2018). Students with self-reported ADHD (n = 231) comprised the first condition. These students were recruited because they reported a diagnosis of ADHD by a mental health professional, current medication use, and undergraduate enrollment on the Prolific demographic screening. All participants had been diagnosed with ADHD before age 18 years. ADHD presentations included predominantly inattentive (33.8%), predominantly hyperactive-impulsive (0.4%), and combined (65.8%). Approximately 16.5% received an individualized education program, Sect. 504 plan, or special education services in primary or secondary school; 13.0% received accommodations on college entrance exams; and 32.9% received formal academic accommodations in college. Students without ADHD and other disabilities (n = 532) were randomly assigned to two additional conditions: honest controls and simulators. These students were recruited because they reported no history of ADHD or other mental health diagnosis or disabilities, no medication for any mental health problems, and undergraduate enrollment on the Prolific demographic screening.

Prior to data collection, we administered additional items to verify that participants met inclusionary criteria for their condition. Response bias was minimized by the fact that participants were not told that the study concerned ADHD assessment and they did not know the inclusionary criteria for their condition (i.e., ADHD, non-ADHD). The first item asked participants to report their educational status. The second item asked participants to report their history of mental health diagnoses and disabilities. The third item asked participants to report their current prescribed medications. To be included in the study, all participants had to verify their current undergraduate status. Participants recruited for the ADHD condition had to select “ADHD” from the list of disabilities and “medication for ADHD” from the list of medications. Participants in the other conditions had to select the “no diagnosis or disability” and “no medication” options from the list of disabilities and medications, respectively. Participants who selected other options were excluded from the study. Similar allocation to conditions based on self-reported history have been used in previous simulation studies (Cook et al., 2018; Edmundson et al., 2017; Erhardt et al., 2024; Ramachandran et al., 2019; Sollman et al., 2010; Suhr et al., 2011; Walls et al., 2017).

Participants who discontinued the study before completion were excluded from our analyses. Participants in the ADHD and honest control conditions with CAARS total symptom scores ≥ 90 were also excluded because of likely invalid data. The survey included attention check items that directed participants to select a specific response option. At the end of the survey, simulators were also asked to describe their instructions as a manipulation check. To pass the manipulation check, simulators were required to mention that their task was to feign or exaggerate symptoms and to include the term “ADHD” or “ADD” in their response. Seven control participants (2.6%), two participants with ADHD (0.9%), and 26 simulators (9.8%) were removed from the sample for failing to meet these criteria.

The final sample included 260 controls, 229 participants in the ADHD condition, and 239 simulators (Table 1). There was no difference across conditions in participants’ age, F(2, 725) = 2.96, p = 0.053, or year in school, χ2(8) = 8.63, p = 0.375. There was a significant relationship between White racial identity and condition, χ2(2) = 8.61, p = 0.013, Cramer’s V = 0.109, with more participants identifying as White in the ADHD condition than in the Simulator condition. There was also a significant relationship between gender and condition, χ2(4) = 9.78, p = 0.044, Cramer’s V = 0.082, with more participants identifying as nonbinary in the ADHD condition than in the other conditions. All associations were small in magnitude.

Table 1 Demographic characteristics of participants across conditions

Measures

CAARS, Self-Report, Long Form, and Validity Indicators

The CAARS, Self-Report, Long Form (Conners et al., 1999) is a 66-item, rating scale used to assess ADHD and related symptoms in adults. Respondents rate the frequency and intensity of each item using a 4-point scale ranging from 0 (not at all, never) to 3 (very much, very frequently). The test yields T scores on several scales based on gender- and age-specific norms. T scores ≥ 65 are considered clinically significant. In our study, participants who reported nonbinary gender identity were evaluated using the more conservative norms for females, since nonbinary norms are not available. Three scales were used in our study: the inattentive symptoms scale, the hyperactive-impulsive symptoms scale, and the total symptoms scale. Items on each scale reflect DSM diagnostic criteria for ADHD. Internal consistency for the scales ranges from 0.66 and 0.67 for men on the hyperactive-impulsive scale to 0.82 for women on the total symptoms scale. Content validity is supported by correspondence with the diagnostic criteria for ADHD. Evidence of convergent validity comes from correlations with observer-reported symptoms ranging from 0.86 to 0.93. CAARS scales discriminate between adults with and without ADHD with 85% accuracy.

The CAARS Inconsistency Index is a validity scale embedded within the test that consists of eight item pairs each reflecting similar content. The index is scored by calculating the absolute value of the differences in the responses of each of the paired items and then summing these values. A cut score ≥ 8 differentiated examinees in the standardization sample with randomly generated responses with a sensitivity and specificity of 96% (Conners et al., 1999). According to the manual, scores ≥ 8 “should be interpreted with caution” (p. 9). Independent studies have shown that the index differentiates random responders from adults who complete the test consistently; however, it does not appear to differentiate students who simulate ADHD from students with the actual condition (Edmundson et al., 2017; Sollman et al., 2010; Walls et al., 2017).

A second set of validity indicators embedded within the CARRS are scales with T scores ≥ 80. According to the manual, scores on CAARS scales ≥ 80 exceed 98.2% of men and 97.7% of women in the standardization sample and “should always be suspect” for response biases (Conners et al., 1999, p. 21). Harrison and Armstrong (2016) found that simulators and students with ADHD could be differentiated by elevated scores on the CAARS inattentive scale (sensitivity 43%, specificity 75%), hyperactive-impulsive scale (sensitivity 33%, specificity 96%), and total scale (sensitivity 55%, specificity 79%). Similarly, Fuermaier et al., (2016) found the highest accuracy in differentiating simulators and adults with ADHD for the CAARS hyperactivity-impulsivity scale (AUC = 82%) compared to the CAARS inattentive scale (AUC = 60%) or the CAARS Inconsistency Index (AUC = 55%).

CAARS Infrequency Index

The CAARS Infrequency Index (CII) was developed by identifying 12 CAARS items that were endorsed by ≤ 10% of college students with and without ADHD. After completing the CAARS, the examiner sums the CII items, yielding a total score ranging from 0 to 36. An initial validation study involved a sample of college students seeking diagnostic assessments at a university clinic (Suhr et al., 2011). A cut score ≥ 21 differentiated credible respondents from participants with CAARS inattentive scale scores ≥ 80 (30% sensitivity, 100% specificity), participants with CAARS hyperactive-impulsive scale scores ≥ 80 (80% sensitivity, 93% specificity), or participants who failed a portion of the Word Memory Test (24% sensitivity, 95% specificity). A second study using clinic-referred students showed that the CII differentiated credible examinees from participants with CAARS scores ≥ 80 (52% sensitivity, 97% specificity), provided invalid responses on the MMPI-2 (20–36% sensitivity, 90% specificity), or failed the Word Memory Test (18% sensitivity, 88% specificity; Cook et al., 2016). A third study compared college students with ADHD to students asked to simulate ADHD to gain access to medication or accommodations, respectively (Cook et al., 2018). The CII detected 17% of accommodation malingerers and 33% of medication malingerers, with a specificity of 84%.

Three independent studies have examined the CII’s ability to differentiate simulators and college students with ADHD. Results yielded sensitivities ranging from 22 to 51% and specificities ranging from 86 to 95% (Edmundson et al., 2017; Robinson & Rogers, 2018; Walls et al., 2017). Fuermaier et al., (2016) compared German adults recently diagnosed with ADHD to simulators. Although the CII detected 32% of simulators, its specificity was only 65%. These divergent results may be attributable to the fact that participants with ADHD were recently diagnosed, were not college students, and had an average age of 29 years. A second study, comparing Dutch adults with ADHD to simulators yielded higher sensitivity (46%) and specificity (95%; Becke et al., 2021).

Dissociative Experiences Scale and Exaggeration Index

The Dissociative Experiences Scale (DES) SVT (Harrison & Armstrong, 2016) consists of 17 items derived from the original dissociative experience scale. Items describe amnestic states, identity alteration, identity confusion, depersonalization, and derealization symptoms. These items were selected because they are not characteristic of ADHD and are infrequently endorsed by adults in the general population. The scale also contains one additional item that describes respondents’ dissatisfaction with their grades in school. The items are integrated into the CAARS and scored using the CAARS response scale. Total scores are evaluated by summing responses. The development sample consisted of two groups of exaggerators (i.e., simulators and clinic-referred participants suspected of exaggeration) and three groups of honest respondents (i.e., ADHD, clinical controls, and nonclinical controls). A cut score of ≥ 20 (85th percentile) differentiated students with ADHD from exaggerators (sensitivity 43%, specificity 90%). A cut score of ≥ 24 (90th percentile) yielded 35% sensitivity and 95% specificity. We used the more conservative cut score in our study.

The Exaggeration Index (EI) was developed from the same sample as the DES SVT (Harrison & Armstrong, 2016). The EI consists of eight criteria that differentiated exaggerators from honest examinees in the original sample: endorsement of five specific items on the DES SVT, a total DES SVT score ≥ 20, a CAARS hyperactive-impulsive score ≥ 80, and a CAARS total symptoms score ≥ 80. Scores range from 0 to 8 depending on the number of criteria met. A cut score of ≥ 3 differentiated exaggerators from all honest examinees (sensitivity 34%, specificity 94%). A cut score of ≥ 4 yielded 24% sensitivity and 97% specificity.

Subtle ADHD Malingering Screener

The Subtle ADHD Malingering Screener (SAMS) is a 10-item, freestanding self-report measure designed to detect ADHD malingering in adults (Ramachandran et al., 2019). The test is based on the accuracy of knowledge theory which posits that malingerers can be differentiated from honest examinees by their lack of knowledge of the disorder, their endorsement of erroneous stereotypes, or their over-endorsement of symptoms (Lanyon, 1997). Items describe symptoms that may be endorsed frequently by malingerers but not by adults with ADHD. Respondents rate each item from 1 (always false) to 7 (always true). Confirmatory factor analysis supported a two-factor structure reflecting academic and psychological problems, respectively. The SAMS yields a total score as well as scores for each factor. A total SAMS cut score ≥ 28 differentiated simulators from honest examinees with and without ADHD (sensitivity 89%, specificity 78%). Classification was improved by requiring a score ≥ 8 on the academic subscale and ≥ 16 on the psychological subscale (sensitivity 90%, specificity 80%). A second study indicated that 55% of students prescribed psychostimulant medication were classified as malingering using the SAMS. However, classification did not differ based on students’ history of nonmedicinal stimulant use or diversion (Ramachandran et al., 2020).

Procedure

Procedures were approved by the Institutional Review Board of the first author. Panelists meeting inclusionary criteria were recruited by Prolific to participate in a research study examining academic problems in college students. After granting consent, participants completed items to verify their diagnostic status, medication status, and disability history. Participants who met inclusionary and exclusionary criteria completed the CAARS with embedded validity indices and the SAMS. Participants in the ADHD and honest control conditions completed all measures about their current functioning. Students with ADHD were asked to rate their behavior when not taking medication for ADHD. Simulators were provided with the following instructions, modified from previous studies (Quinn, 2003; Ramachandran et al., 2019; Robinson & Rogers, 2018):

In this study, you are going to complete several questionnaires that are like those given to college students who have concerns that they might have ADHD. As you complete the questionnaires, we want you to answer as if you were the following student: Since beginning college, you have struggled academically. Although you earned good grades in high school with very little effort, your grades in college are lower than you expected. You also find yourself working much longer and harder than your classmates to earn these grades. Some of your classmates have told you that they get academic accommodations because they were diagnosed with ADHD. For example, they receive additional time to complete exams, access to a note-taker or recorded lectures, and permission to use notes and formulas during tests. You think that these accommodations might also help you, so you decide to be evaluated for ADHD. As you complete the ADHD questionnaires, try to convince the examiner that you have ADHD and should be granted accommodations. But be careful. Some questions are designed to catch people who exaggerate. Complete the items as if you really have ADHD.

After completing the rating scales, simulators were asked to stop feigning ADHD and to complete the manipulation check. Then, all participants answered demographic items and were debriefed.

Results

Differences Across Conditions

We conducted three analyses of variance (ANOVAs) examining differences in CAARS scores as a function of condition. The independent variable in each analysis was condition with three levels: controls, ADHD, and simulators. The dependent variables were participants’ CAARS hyperactive-impulsive, inattentive, and total symptom scores, respectively. Preliminary tests of the assumptions of ANOVA revealed several significant Shapiro-Wilks tests, but Q-Q plots and inspection of skewness and kurtosis indicated reasonable normality. Moreover, ANOVA is robust to violations of this assumption given the sample size of each condition (Gamst et al., 2008). Results showed significant differences for hyperactive-impulsive, F(2, 725) = 306.20, p < 0.001, η2 = 0.458; inattentive, F(2, 725) = 248.97, p < 0.001, η2 = 0.407; and total symptoms, F(2, 725) = 324.69, p < 0.001, η2 = 0.472. We conducted Games-Howell follow-up tests for each analysis because of significant heterogeneity of variances (Sauder & DeMars, 2019). Results, evaluated at p ≤ 0.017 to control for familywise error, showed significant differences across all conditions for each of the analyses (Table 2). Specifically, simulators earned higher scores than participants in the ADHD condition, and participants in the ADHD condition earned higher scores than controls. Effect sizes ranged from medium to large (Cohen, 2016).

Table 2 Descriptive statistics for participants across conditions

We conducted an ANOVA analyzing differences in CAARS Inconsistency Index scores as a function of condition. Results showed a significant overall difference F(2, 725) = 12.98, p < 0.001, η2 = 0.035. Contrary to expectations, follow-up tests indicated that participants in the ADHD condition earned significantly higher scores than participants in the other two conditions (Table 2).

We conducted three additional ANOVAs analyzing differences in CII, EI, and DES scores as a function of condition. Results showed a significant overall difference for CII, F(2, 725) = 228.39, p < 0.001, η2 = 0.387; EI, F(2, 725) = 144.48, p < 0.001, η2 = 0.285; and DES scores, F(2, 725) = 126.31, p < 0.001, η2 = 0.258. Follow-up tests (Table 2) showed significant differences across all three conditions for each analysis. Simulators earned higher scores than participants in the ADHD condition, and participants in the ADHD condition earned higher scores than controls. Effect sizes ranged from medium to large.

Finally, we conducted ANOVAs analyzing the differences in SAMS scores as a function of condition. In each analysis, the dependent variable was SAMS academic subscale scores, SAMS psychological subscale scores, and SAMS total scores. The results showed significant overall differences for SAMS academic, F(2, 725) = 211.09, p < 0.001, η2 = 0.368; psychological, F(2, 725) = 291.67, p < 0.001, η2 = 0.446; and total scores, F(2, 725) = 337.03, p < 0.001, η2 = 0.482. Follow-up tests (Table 2) showed significant differences across all three conditions for each analysis, with simulators earning higher scores than participants in the ADHD condition, and participants in the ADHD condition earning higher scores than controls. Effect sizes were large.

Classification Analyses

We identified the percent of participants in each condition with elevated SVT scores using the recommended cutoffs. Then, for each SVT, we used these percents to calculate specificity toward controls, sensitivity toward simulators, and the percent of positive cases among participants in the ADHD condition. Sensitivity, specificity, and positive cases were calculated in this manner because we could not verify that participants allocated to the ADHD met DSM-5 criteria at the time of the study and were not exaggerating symptoms. Results are shown in Table 3.

Table 3 Percent of positive cases for symptom validity tests across conditions

Specificity toward controls was highest for elevated CAARS hyperactive-impulsive scores (100%), CAARS total scores (99.6%), and the CII (99.6%). The recommended (≥ 3) cut score of the EI yielded low specificity toward controls. However, a more conservative cut score (≥ 4) yielded a specificity toward controls of 98.5%. Specificity toward controls was somewhat lower for elevated CAARS inattentive scores (97.7%) and much lower for all the SAMS scales (< 80%). In contrast, sensitivity toward simulators was highest for the SAMS scales (> 90%) and lowest for elevated CAARS hyperactive-impulsive scores (35.6%), the CII (39.7%), and EI scores ≥ 4 (42.7%). The percent of positive cases among participants in the ADHD condition was lowest for elevated CAARS hyperactive-impulsive scores (1.3%), the CII (7%), and EI scores ≥ 4 (12.2%). The SAMS scales yielded high rates of positive cases among participants in the ADHD condition (> 60%).

Table 3 also shows the results of contingency analyses examining differences in the number of participants with positive scores on each SVT as a function of condition. Results were evaluated at p ≤ 0.001 to control for error. For nearly all the SVTs, simulators were more likely to be identified as noncredible than participants in the ADHD condition, and participants in the ADHD condition were more likely to be identified as noncredible than controls. Effect sizes ranged from φ = 0.18 to 0.71. However, there was no difference in the number of participants identified as noncredible using the CAARS hyperactive-impulsive scale across the control and ADHD conditions. Moreover, the number of participants identified as noncredible using the CAARS Inconsistency Index did not differ across all three conditions.

Because current practice involves evaluating response validity using multiple indicators, we also calculated classification statistics for elevations on multiple SVTs. We focused on elevated CAARS hyperactive-impulsive scores, the CII, and EI scores ≥ 4, because these SVTs yielded the highest specificity to controls, highest sensitivity to simulators, and the lowest percent of positive cases among participants in the ADHD condition. Results are shown on Table 3. Requiring elevations on multiple SVTs yielded positive cases among honest participants in the control and ADHD conditions ranging from 0 to 3.1%. Sensitivity to simulators ranged from 27.2 to 31.8%. As before, contingency analyses showed that simulators were more likely than participants in the control and ADHD conditions to be identified as noncredible based on elevations on multiple SVTs. Effect sizes ranged from φ = 0.38 to 0.44. The number of participants in the control and ADHD conditions who were identified as noncredible based on elevations on multiple SVTs did not differ.

Associations Among SVTs

We used kappa to determine the inter-classification agreement among the SVTs for simulators. Results (Table 4), evaluated at p ≤ 0.001 to control for error, showed associations in the moderate to substantial range between elevated scores on the CAARS hyperactive-impulsive scale, the CII, and EI scores ≥ 4 (Cohen, 1960; κ = 0.50 to 0.63). Associations between the SAMS scales and nearly all the other SVTs were low (κ = 0.05 to 0.19). The CAARS Inconsistency Index was not associated with any of the other SVTs.

Table 4 Classification agreement (kappa) between symptom validity tests for simulators

Discussion

Professional organizations and assessment researchers have repeatedly urged clinicians to assess response validity when conducting adult ADHD evaluations (see Lovett & Harrison, 2019; Sweet et al., 2021). Nevertheless, the frequency with which psychologists administer SVTs to college students seeking an ADHD diagnosis is disappointingly low. Moreover, the use of SVTs among medical professionals who diagnose this condition is unknown. One barrier to the widespread use of validity testing in adult ADHD evaluations is the availability, time, and cost of administering and interpreting these instruments. The purpose of our study was to compare several brief, domain-specific SVTs using an independent sample of college students with and without this condition. We hoped that our findings might provide practical guidance for clinicians who want to rely on both clinical judgment and actuarial data when rendering decisions regarding response validity.

Group Differences in ADHD Symptoms

As expected, the mean standard scores for control participants in our study were within normal limits for all the CAARS scales, and significantly lower than participants in the other two conditions. Also as expected, participants in the ADHD condition scored within the clinically significant range on both the CAARS inattentive and total symptoms scales, but within the average range on the CAARS hyperactive-impulsive scale. This pattern of scores is like the results of previous studies involving college students (Cook et al., 2018; Harrison & Armstrong, 2016; Sollman et al., 2010; Walls et al., 2017). It is likely that adults with genuine ADHD do not experience many of the DSM-5 hyperactive-impulsive symptoms because many of these symptoms describe young children with this condition. Adults who exaggerate ADHD might overreport hyperactive-impulsive symptoms because of the mistaken belief that these symptoms manifest the same way from early childhood to adulthood.

In fact, simulators’ mean CAARS scores fell in the clinically-significant range on all three scales, with scores significantly higher than participants in the ADHD condition. These findings add to a growing body of empirical data showing that college students can convincingly feign ADHD without coaching or preparation (Harrison et al., 2021). However, the large discrepancy (d = 1.07) between participants in the simulator and ADHD conditions on the CAARS hyperactive-impulsive scale suggests that this scale may differentiate these two types of respondents.

Group Differences in SVT Scores

The CAARS Inconsistency Index did not differentiate simulators from honest examinees. Although there was a significant overall difference in inconsistency scores as a function of condition, participants in the ADHD condition showed higher scores than participants in the other two conditions. It is likely that ADHD-related impairments in attention and concentration contribute to the slightly higher inconsistency scores shown by participants with ADHD. Walls et al., (2017) did not find significantly higher inconsistency scores among students with ADHD compared to controls or simulators. However, they did find that the Inconsistency Index differentiated students instructed to respond randomly from honest examinees. Taken together, these findings indicate that the Inconsistency Index may be a valid indicator of random responding, but it is not useful in detecting symptom exaggeration.

Participants in all three conditions showed significant differences on all the other SVTs in our study. Specifically, simulators earned significantly higher scores than students in the ADHD condition, whereas students in the ADHD condition earned significantly higher scores than controls. As expected, the difference between controls and students with actual or simulated ADHD was large in magnitude. More importantly, the difference between simulators and students in the ADHD condition was also large for all the validity scales except the DES. These findings suggest that these SVTs may differentiate simulators from students with actual ADHD.

Classification

Specificity toward controls was high for all three CAARS scales, ranging from 97.7 to 100%. However, elevations on the CAARS inattentive and CAARS total scales identified 23.6% and 17.0% of participants in the ADHD condition as noncredible, respectively. It is possible that some participants with self-reported ADHD exaggerated symptoms; however, there were no obvious external incentives for doing so because responses had no bearing on access to medication, accommodations, or other services. Instead, it is likely that many of these positive cases reflect participants with genuine ADHD who were misidentified as noncredible. If so, these percents are unacceptably high, given that false-positive cases risk denying treatment and support to students who are legally and ethically entitled to receive them (Keenan et al., 2019). In contrast, elevations on the CAARS hyperactive-impulsive scale identified only 1.3% of participants in the ADHD condition and 35.6% of simulators. As previously mentioned, it appears that college students who exaggerate symptoms overestimate the degree to which adults with ADHD continue to squirm, leave their seat, and experience other hyperactive-impulsive symptoms. Altogether, these findings indicate that a CAARS hyperactive-impulsive score ≥ 80 may indicate symptom exaggeration, but elevations on the other CAARS scales may be less useful in identifying noncredible responding.

The CII also identified a low percent of honest participants in the control (0.4%) and ADHD (7.0%) conditions as noncredible. Moreover, its sensitivity to simulators was 39.7%. These percents are generally within the range of values obtained in previous studies involving students with ADHD, simulators, or suspected exaggerators. Our study, therefore, adds to emerging data indicating that the CII is an effective SVT for college students. The CII has an advantage over other validity tests because it does not require the administration of additional test items, lengthy or costly personality tests that are usually not needed to assess ADHD, or performance validity tests that are not available to many clinicians.

The DES identified a relatively high percent of honest responders as noncredible in the control (5.8%) and ADHD (27.5%) conditions. Harrison and Armstrong (2016) also noted that the DES resulted in a high rate of false positives among honest controls, clinical controls, and participants with ADHD. In our study, the EI identified similarly high rates of noncredible responding among honest examinees using the recommended ≥ 3 cut score. Our findings contrast with the results obtained from the development sample, which yielded a 24% sensitivity and 97% specificity (Harrison & Armstrong, 2016). Our divergent findings may be partially explained by the fact that the EI was developed from the same sample upon which its sensitivity and specificity were calculated. Items on the EI may reflect idiosyncrasies of the original sample. However, we found that a more conservative cut score of ≥ 4 yielded a specificity toward controls of 98.5% and a sensitivity toward simulators of 42.7%, with a relatively low rate of positive cases among participants in the ADHD condition (12.2%). These data indicate that the EI may be an effective SVT when using this more conservative cut score.

The SAMS scales yielded the highest sensitivity to simulators of any of the SVTs that we studied. Indeed, the sensitivity seen in our study (> 90%) was nearly identical to the sensitivity reported in the original validation study using either the SAMS total score (89%) or elevated scores on both the SAMS academic and psychological subscales (90%; Ramachandran et al., 2019). Similarly, in our study, specificity to controls for the SAMS total (76.9%) and the SAMS academic and psychological subscales (79.2%) was only slightly lower than the specificities reported in the original validation study (78–80%; Ramachandran et al., 2019). Unfortunately, in our study, the SAMS scales identified most participants in the ADHD condition as noncredible. This divergent finding may be attributable to two methodological differences. First, the authors of the original study calculated specificity by comparing simulators to all honest examinees (i.e., examinees with and without ADHD). Second, most (72%) honest examinees in the original study did not have ADHD. These findings indicate that the SAMS may differentiate simulators from adults without ADHD, but it is less effective in differentiating simulators from adults with ADHD. Consequently, it may not be helpful in many diagnostic settings.

Classification Agreement and Integrating SVTs

Agreement among elevations on the CAARS hyperactive-impulsive scale, the CII, and the EI ≥ 4 fell in the moderate to substantial range (Cohen, 1960; κ = 0.50 to 0.63). In contrast, agreement between the SAMS scales and the other SVTs were slight in magnitude (κ < 0.20) and the CAARS Inconsistency Index was not associated with any of the other SVTs.

The agreement between the CAARS hyperactive-impulsive scale, the CII, and the EI ≥ 4 is attributable to at least two factors. First, it is likely that scores on these three SVTs reflect an underlying construct related to symptom exaggeration. Because the agreement was not high (i.e., κ > 0.80), the SVTs may reflect somewhat different facets of this construct. For example, elevations on the CAARS hyperactivity-impulsivity scale and the CII appear to reflect exaggeration through the endorsement of ADHD symptoms that are infrequently reported by adults with this condition, whereas elevations on the EI appears to also reflect exaggeration through the endorsement of dissociative symptoms that are not characteristic of ADHD. Second, it is important to note that these three SVTs have overlapping items. For example, 25% of the CII items also appear on the CAARS hyperactive-impulsive scale. Similarly, 20% of the EI items reflect elevations on the CAARS scales. Therefore, support for the concurrent validity of these measures as indicators of symptom exaggeration is attenuated by item overlap.

Given only moderate agreement between these three SVTs, it is reasonable to examine classification statistics generated by positive identification on multiple scales. Requiring elevations on multiple scales yielded high specificity to participants in the control (100%) and ADHD conditions (96.9–100%). Sensitivity to simulators ranged from 27.2 to 31.8%. Although this sensitivity to simulators is low, students who are identified as noncredible based on the results of two SVTs appear to have a very low likelihood of being misclassified.

Recommendations for Clinicians

We suggest a hierarchical approach to response validity assessment in adult ADHD evaluations. First, we believe that all adult ADHD evaluations should include multiple, domain-specific SVTs and, if practical, at least one performance validity test (Sweet et al., 2021). We recognize that in many cases, it is not practical to administer performance validity tests or symptom validity scales that are part of longer personality tests (e.g., MMPI, PAI) because of the availability, time, and cost of these instruments. However, several brief, ADHD-specific SVTs are now available that are accessible to clinicians and can detect noncredible responding. Given the ease with which students can exaggerate symptoms, failing to assess response validity with at least one of these instruments is not professionally justifiable (Suhr & Berry, 2017).

Second, clinicians may wish to initially examine response validity with either the CII or EI (using the more conservative ≤ 4 cut score). In our study, these scales showed sensitivities to simulators of 40 and 43%, respectively, with relatively low percentages of honest responders classified as noncredible. A significant elevation on one of these measures should prompt the clinician to look for evidence of significant symptoms and functional impairment from objective sources, such as childhood school or medical records, a current transcript, or ratings provided by other informants (see Lovett & Harrison, 2021; Nelson & Lovett, 2019; Weis et al., 2021). A significant elevation on one of these measures should also prompt the clinician to more carefully explore alternative reasons for the student’s motivation to participate in an ADHD evaluation. For example, the student may be experiencing academic difficulties caused by anxiety or mood disorders, poor sleep hygiene or substance use problems, or poor study skills. Students may believe they have ADHD simply because they perceive themselves as working longer and harder than their peers to achieve the same level of academic achievement. They may also observe peers misuse psychostimulants or accommodations to achieve higher test scores or grades. Assigning an inappropriate diagnostic label and providing medication and/or accommodations to these students overlooks the root cause of their academic concerns and fails to provide them with evidence-based interventions that might help in life after college.

Third, if a student fails both the initial symptom validity test and lacks objective evidence to support an ADHD diagnosis, the clinician should look for an elevated (≥ 80) score on the CAARS hyperactive-impulsive scale. Elevations on both the CII or EI and the CAARS hyperactivity-impulsivity scale would provide strong evidence of noncredible responding given the high specificity of the combination of these tests. Clinicians can be fairly confident that the responses provided by students who fail both tests are not credible. Such data can support clinicians who decide to forgo an ADHD diagnosis.

Limitations and Future Directions

The primary threat to our study’s internal validity is that we were not able to independently confirm the accuracy of each participant’s diagnostic status. Our study relied on self-reported diagnostic history with verification prior to participation, similar to the methods used in other simulation studies (Cook et al., 2018; Erhardt et al., 2024; Ramachandran et al., 2019; Sollman et al., 2010; Suhr et al., 2011; Walls et al., 2017). However, we could not confirm that all participants in the ADHD condition met DSM-5 diagnostic criteria at the time of the study or that they did not exaggerate symptoms. Similarly, we could not confirm the exact age of participants’ first ADHD diagnosis or gather other details about their diagnostic histories. We addressed these limitations in several aways. First, potential participants completed Prolific demographic measures prior to recruitment; therefore, they could not report an ADHD diagnosis and medication use simply to be eligible for the study. Second, participants were blind to the purpose of the study and the inclusionary criteria; therefore, they were unlikely to misreport their diagnostic status because of demand characteristics. Third, participation was anonymous and assessment occurred outside of a diagnostic, medical, or educational context; therefore, there were no obvious incentives for participants to exaggerate symptoms to acquire medication, accommodations, services or to express a need for help. Fourth, participants with very elevated CAARS total scores were excluded from the study to further reduce the likelihood of symptom exaggeration. Finally, we calculated specificity toward controls, sensitivity toward simulators, and positive cases among participants in the ADHD condition, separately. The validity of our allocation method is supported by normative data for each condition similar to the results of previous studies, including students in the ADHD condition showing clinically significant inattentive and total ADHD symptoms and controls earning scores within normal limits on all ADHD scales. Moreover, the classification statistics for the SVTs in our study were like those reported in previous research. However, allocation could be improved by having licensed professionals assess each participant immediately prior to the study to verify the appropriateness of their assignment to each condition.

A second threat to our study’s internal validity is the care with which simulators took to avoid detection. Although we warned simulators that symptom exaggeration might be detected, the consequences of detection were low. In academic or clinical settings, the deliberate feigning of ADHD symptoms would likely have serious consequences. In our study, however, simulators did not risk adverse academic, legal, or social outcomes if their exaggeration was detected. It is possible that students who malinger in real-world settings do so more subtly or carefully than the simulators in our study. For example, they may endorse only enough symptoms required for a diagnosis, refrain from endorsing hyperactive-impulsive symptoms, or avoid uniformly high responses on rating scales. More careful responding would make them more difficult to detect, thereby lowering the sensitivity estimates obtained in our study.

A primary threat to the external validity of our study rests in the representativeness of our sample. To our knowledge, ours is the largest study examining the sensitivity and specificity of multiple ADHD-specific SVTs in college students. Moreover, unlike most previous studies, we recruited participants from many different postsecondary schools and geographic regions. However, our sample included only US undergraduates. We do not know if our findings generalize to other populations, such as high-school students seeking accommodations on college entrance exams, graduate or professional students seeking accommodations on high-stakes licensing exams, or adult non-students seeking psychostimulants. We also do not know if our results generalize to students in other countries. For example, Fuermaier et al., (2016) noted that their divergent findings regarding the CII might be attributable to the fact that they recruited an older, nonstudent sample of European adults. Future research should continue to investigate the utility of these tests with other samples given the risks associated with misidentification.

Future research should also compare the results of these SVTs with the newly-developed Negative Impression Index (NII) of the CAARS-2 (Erhardt et al., 2024). Items for the NII describe symptoms that were infrequently endorsed by honest respondents with and without ADHD and more frequently endorsed by simulators. The test developers selected six items for the NII that best differentiated simulators (n = 102) from adults with ADHD (n = 225). The NII was validated by comparing a new sample of simulators (n = 111) to the same sample of adults with ADHD, yielding an overall classification accuracy of 84.7%. The NII is a useful addition to the CAARS-2; however, it was validated using the same sample of ADHD participants from which it was developed. It is possible that NII items capitalize on idiosyncrasies within this sample of participants who were recruited from clinicians in the community and through social media. Moreover, because most participants were not college students, they may have different motivations and incentives than college students seeking an ADHD evaluation. Finally, participants in the ADHD sample were predominantly White (86%) and likely older (M = 36 years) than the traditional US college student population. It would be helpful to validate the NII with a new, representative sample of college students with ADHD. Comparing the NII with the CII and EI could be important to this validation process.

Our findings demonstrate the ease with which college students can feign ADHD and identify several brief, ADHD-specific SVTs that most clinicians can use to detect noncredible responding in this population. The CII and EI, in particular, can be useful to screen students with a sensitivity to simulators of approximately 40%. If students’ self-reports of ADHD symptoms and impairment are not clearly corroborated by objective data, then CAARS hyperactive-impulsive scores ≥ 80 can be used to verify symptom exaggeration with high specificity in honest respondents. We hope that these findings encourage clinicians to routinely administer SVTs as part of the adult ADHD assessment process.