The Iowa Gambling Task (IGT; Bechara et al., 1994) is frequently used to assess reward learning within the Positive Valence Systems of the Research Domain Criteria (PVS Work Group, 2011). During the task, participants make choices between one of four decks of cards, two of which have advantageous outcomes (i.e., net monetary wins) and two of which have disadvantageous outcomes (i.e., net monetary losses), on average, across trials. Based on trial-by-trial feedback, participants can learn to select advantageous decks and avoid disadvantageous decks. Learning within the task is thought to reflect aspects of affective processing such that “somatic markers” (i.e., the emotional signals associated with wins & losses) guide individuals to choose the advantageous decks and avoid the disadvantageous decks. The IGT was initially developed to identify decision-making deficits among individuals with damage to the ventromedial prefrontal cortex (VMPFC; Bechara et al., 1994, 1997), an area of the brain implicated in emotion regulation (Winecoff et al., 2013). Subsequent studies have extended these findings by showing that impairments in other areas of the prefrontal cortex (e.g., medial PFC), as well as regions outside of the prefrontal cortex (e.g., amygdala), also are related to decision-making deficits on the IGT (Aram et al., 2019). Based on these relations between IGT performance and neurological functioning, use of the IGT has been extended to other clinical populations (Bechara, 2007; Buelow & Suhr, 2009), including depression (Cella et al., 2010; Must et al., 2006; Siqueira et al., 2018) and substance use (Solowij et al., 2012). Despite the popularity of its use, the test-retest reliability of IGT performance (Buelow & Barnhart, 2018; Schmitz et al., 2020; Sullivan-Toole et al., 2022), and associations between IGT performance and individual difference characteristics, vary considerably across studies (Baeza-Velasco et al., 2020; Byrne et al., 2016; Case & Olino, 2020; Jollant, 2016; McGovern et al., 2014; Mueller et al., 2010; Smoski et al., 2008). Thus, there are outstanding concerns regarding the reliability and validity of the IGT that must be addressed for it to be used as a clinical assessment tool.

We focused on two aspects of the IGT that could influence the reliability and validity of the IGT: how the task is structured, and how behavioral performance is quantified. First, the original IGT is structured such that participants can choose among the four decks simultaneously and choices between the different decks are mutually exclusive. Therefore, participants’ performance conflates approach toward the advantageous decks, reflecting reward learning, and avoidance of the disadvantageous decks, reflecting punishment learning. Given the variability in performance on the task, later studies altered the IGT such that participants are presented with a card from a specific deck on each trial, and they have the opportunity to “play” or “pass” on that card (Peters & Slovic, 2000). Thus, in the revised play-or-pass version of the IGT, reward and punishment learning are dissociable. This is an attractive feature of the play-or-pass IGT because previous research indicates that there are both behavioral and neuropsychological differences in reward and punishment learning, which could be more readily captured by this version of the IGT (Christakou, et al., 2013; Frank et al., 2004; Gershman, 2015). This revised version of the IGT shows some promise with there being expected associations between task performance and pubertal development (Icenogle et al., 2017); however, there are inconsistent associations with clinical outcomes (Case & Olino, 2020). To our knowledge, the reliability of the play-or-pass IGT has yet to be tested; thus, a major goal of this study was to assess the reliability of measures from this version of the IGT.

The second aspect of the IGT that could influence reliability and validity is how behavior is characterized on the task. Traditionally, the proportion of selections for “good” (i.e., advantageous) and “bad” (i.e., disadvantageous) decks have been used as measures of reward and punishment learning, respectively; however, such measures are gross characterizations of behavior, which can be problematic because they do not capture learning within the task. Some studies have employed changes in choice proportions in early versus late trials (or blocks) as measures of within-session learning (Brand et al., 2007); however, these approaches do not capture the specific processes that may influence trial-by-trial learning within the task. An alternative approach to these summary measures comes from advancements in computational modeling. Computational models mathematically delineate the theoretically hypothesized processes that give rise to the observed data (i.e., choices within the task). This allows us to characterize individual differences in task performance that may otherwise be obscured by summary measures.

Given the issues associated with using summary measures of behavior, a second major goal of this study was to extend the Outcome-Representation Learning (ORL) model to this version of the IGT. The ORL model is a trial-level reinforcement learning computational model that was developed for the original IGT (Haines et al., 2018) that builds on previous computational models for the IGT (Ahn et al., 2008; Busemeyer & Stout, 2002; Worthy et al., 2013). The ORL model performs well in predicting participants’ earnings and trial-to-trial choices and decomposes task behavior into distinct processes, yielding five parameters: reward learning rate, punishment learning rate, win frequency sensitivity, perseveration tendency, and memory decay. These parameters inform a computation of the subjective value of each deck, which is updated on a trial-by-trial basis depending on the outcome received. Importantly, Haines et al. showed that individual differences in ORL parameters were related to socially significant individual differences that may be of interest to clinicians (e.g., substance use; see also Kildahl et al., 2020). More recently, Sullivan-Toole et al. (2022) found that individual differences in ORL parameters were related to internalizing symptoms (e.g., depressive symptoms). Thus, parameters from a computational model, such as the ORL, may be useful to characterize facets of decision-making related to human health (e.g., depression).

In their study, Sullivan-Toole et al. showed that the most reliable ORL parameters were obtained by using joint modeling—an approach that involves fitting a model to data from multiple sessions to estimate parameters and their reliabilities within a single model (Haines et al., Traditional scoring approach

Traditional scoring involved calculating the proportion of plays on good (i.e., advantageous) decks and bad (i.e., disadvantageous) decks, separately. Session-wide play proportions on good and bad decks represent gross measures of reward and punishment learning, respectively. Specifically, more plays on the good decks reflect higher reward learning (i.e., approach of reward) and fewer plays on the bad decks reflect higher punishment learning (i.e., avoidance of punishment). To evaluate mean-level stability, we used paired-samples t-tests to compare the proportion of plays between sessions 1 and 2, separately for good and bad decks. To evaluate rank-order stability, we calculated correlations between the proportion of plays during sessions 1 and 2, separately for good and bad decks. Next, we correlated the proportion of plays on good decks with the proportion of plays on bad decks to determine whether these measures could dissociate reward and punishment learning. Finally, we evaluated construct validity by correlating the proportion of plays on good and bad decks with scores on each of the self-report measures.

Computational modeling approach

For the computational model, we fit a modified version of the Outcome-Representation Learning (ORL) model. The ORL is a generative model that was developed to analyze choice in the original IGT in which all four decks are presented simultaneously and participants play on one of the four decks on each trial (Haines et al., 2018; Sullivan-Toole et al., 2022). In the present study, participants were presented with decks one at a time and would either play or pass on the presented deck within each trial. Thus, we modified the original ORL to accommodate the play-or-pass nature of this version of the IGT. Specifically, choices to play or pass were modeled as a function of the value of playing on that deck using a logistic function:

$${Y}_{j}(t)\sim bernoulli\left(\frac{1}{1+exp(-{V}_{j}\left(t\right))}\right)$$
(1)

where \({Y}_{j}\left(t\right)\) indicates whether the participant played (\({Y}_{j}\left(t\right)=1\)) versus passed (\({Y}_{j}\left(t\right)=0\)) when presented with deck j on trial t, and Vj(t) is the value of playing when presented with deck j on trial t. This choice rule implies that the value of passing is always held constant at 0—only the value of playing is updated on a trial-by-trial basis. Specifically, after each choice, Vj is updated according to the following equation:

$${V}_{j}\left(t+1\right)= {EV}_{j}\left(t+1\right)+{EF}_{j}\left(t+1\right)\bullet \beta f+\beta b$$
(2)

where Vj(t + 1) is the value of playing on deck j on the next trial (i.e., t + 1), EVj(t + 1) is the expected outcome value associated with playing or passing on deck j in the next trial, EFj(t + 1) is the expected win frequency of playing or passing on deck j in the next trial, βf is a free parameter describing sensitivity to win frequency, and βb is a free parameter describing bias towards playing (when positive) or passing (when negative), regardless of the deck.

The expected outcome value (EVj) and expected win frequency (EFj) are updated from trial to trial based on the outcome received after playing on deck j. Specifically, the expected outcome value for the next trial (i.e., t + 1) is calculated by

$${EV}_{j}\left(t+1\right)=\left\{\begin{array}{l}{EV}_{j}\left(t\right)+{A}_{rew}\bullet \left(x\left(t\right)-{EV}_{j}\left(t\right)\right)\ \ \ \ \ \ \ if\ {Y}_{j}\left(t\right)=1 \text{and}\ x\left(t\right)\ge 0\\ {EV}_{j}\left(t\right)+{A}_{pun}\bullet \left(x\left(t\right)-{EV}_{j}\left(t\right)\right) \ \ \ \ \ \ \ if\ {Y}_{j}\left(t\right)=1\ \text{and}\ x\left(t\right)<0\\ {EV}_{j}\left(t\right)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ if\ {Y}_{j}\left(t\right)=0\end{array}\right.$$
(3)

where EVj(t) is the expected outcome value of playing on deck j in the current trial (i.e., t), x(t) is the amount of the gain or loss, Arew is a free parameter describing learning rate for gains (i.e., when x(t) > 0), and Apun is a free parameter describing learning rate for losses (i.e., when x(t) < 0). The expected win frequency for the next trial is calculated by

$${EF}_{j}\left(t+1\right)=\left\{\begin{array}{l}{EF}_{j}\left(t\right)+{A}_{rew}\bullet \left(sgn\left(x\left(t\right)\right)-{EF}_{j}\left(t\right)\right)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ if\ {Y}_{j}\left(t\right)=1\ \text{and}\ x\left(t\right)\ge 0\\ {EF}_{j}\left(t\right)+{A}_{pun}\bullet \left(sgn\left(x\left(t\right)\right)-{EF}_{j}\left(t\right)\right)\ \ \ \ \ \ \ \ \ \ \ \ \ \ if\ \ {Y}_{j}\left(t\right)=1\ \text{and}\ x\left(t\right)<0\\ {EF}_{j}\left(t\right)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ if\ {Y}_{j}\left(t\right)=0\end{array}\right.$$
(4)

where EFj(t) is the expected win frequency of playing on deck j in the current trial, Arew and Apun are as described above, and the sgn(x(t)) term refers to the sign of the outcome (using the signum function).Footnote 2 In addition to updating the expected win frequency of the current deck, the expected win frequencies for playing on the other decks are updated according to the following fictive updating rule:

$${EF}_{'j}\left(t+1\right)\left\{\begin{array}{l}{EF}_{'j}\left(t\right)+A_{pun}\bullet\left(\frac{-sgn\left(x\left(t\right)\right)}C-EF_{j}^{^{\prime}}\left(t\right)\right)\;\;\;\;\;\;\;if\;Y_j\left(t\right)=1\;\text{and}\;x\left(t\right)\geq0\\{EF}_{'j}\left(t\right)+A_{rew}\bullet\left(\frac{-sgn\left(x\left(t\right)\right)}C-EF_j^{^{\prime}}\left(t\right)\right)\;\;\;\;\;\;\;if\ Y_j\left(t\right)=1\ \text{and}\ x\left(t\right)<0\\{EF}_{'j}\left(t\right)\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;if\ Y_j\left(t\right)=0\end{array}\right.$$
(5)

where EF’j(t) is the expected win frequency of the other decks during the current trial, C is the number of other decks available (i.e., 3), and all other terms are as described above. The parameterization of this version of the ORL model is the same as the original model, except we removed memory decay (K) from the original model, and we replaced βp (perseverance) from the original model with βb, bias. We chose this parameterization because when we fit the original ORL model to this data, βp functioned similar to bias (i.e., was associated with generally playing/passing more frequently), and memory decay showed poor reliability as well as poor recovery when we conducted parameter recovery diagnostics. Details of fitting the original ORL model and the parameter recovery diagnostics are in the supplemental file. In summary, the ORL model for the play-or-pass IGT has four free parameters, Arew, Apun, βf, and βb, that allow us to capture individual-differences related to task performance.

As described above, we used a joint modeling approach to model choice data from both sessions simultaneously in a single model.Footnote 3 We estimated Arew, Apun, βf, and βb hierarchically such that parameters were allowed to vary for each participant within each session (i.e., random effects for participants). Fitting the model hierarchically in this fashion allowed for information across individuals and sessions to be pooled across person-level parameters which causes person-level estimates to regress toward the group-level mean. For each free parameter (e.g., βf), person-level parameter estimates were assumed to follow a multivariate normal distribution, given by the following:

$$\left[\begin{array}{c}{\theta }_{i,1}\\ {\theta }_{i,2}\end{array}\right]\sim MVNormal\left(\left[\begin{array}{c}{\mu }_{\theta ,1}\\ {\mu }_{\theta ,2}\end{array}\right] , {\mathbf{S}}_{\theta }\right)$$
(6)

where θi1 and θi2 are the person-level parameters for participant i on sessions 1 and 2, respectively; θ refers to either `Arew, `Apun, βf, or βb; μθ1 and μθ2 are the group-level parameter means for sessions 1 and 2, respectively; and Sθ is the covariance matrix for session 1 and 2 person-level parameters. In the ORL, Arew and Apun are bounded between 0 and 1; thus, `Arew and `Apun were estimated assuming a multivariate normal distribution (Eq. 6) and then transformed using the inverse cumulative normal distribution to obtain Arew and Apun (Haines et al., 2018). The covariance matrix, Sθ, can be decomposed into a 2 × 2 matrix of the group-level standard deviations for each session (σθ,1 & σθ,2) and a 2 × 2 correlation matrix (Rθ):

$${\mathbf{S}}_{{\varvec{\theta}}}=\left[\begin{array}{cc}{\sigma }_{\theta ,1}& 0\\ 0& {\sigma }_{\theta ,2}\end{array}\right]{\mathbf{R}}_{{\varvec{\theta}}}\left[\begin{array}{cc}{\sigma }_{\theta ,1}& 0\\ 0& {\sigma }_{\theta ,2}\end{array}\right]$$
(7)

where

$${\mathbf{R}}_{{\varvec{\theta}}}=\left[\begin{array}{cc}{\rho }_{\theta 12}& 0\\ 0& {\rho }_{\theta 12}\end{array}\right]$$
(8)

and ρθ12 is the correlation between θ on session 1 and θ on session 2 (i.e., the test-retest reliability estimate).

For each parameter, we used weakly informative priors. Priors for group-level means on sessions 1 and 2 in Eq. 6 were specified as

μθ1

~ Normal(0,1)

μθ2

~ Normal(0,1)

for `Arew, `Apun, βf, and βb. Priors for group-level standard deviations on sessions 1 and 2 in Eq. 7 were specified as

σθ1

~ Half-Normal(0.2)

σθ2

~ Half-Normal(0.2)

for `Arew and `Apun and as

σθ1

~ Half-Cauchy(1)

σθ2

~ Half-Cauchy(1)

for βf and βb. We chose stronger, half-normal priors on the σs for `Arew and `Apun to avoid extreme values of these parameters after transforming. The priors for the correlation matrices were specified as

Rθ12

~ LKJcorr(1)

for each parameter. Finally, the session 1 and session 2 group-level means and standard deviations served as the priors for the person-level parameters on sessions 1 and 2, respectively, such that

θi1

~ Normal(μθ1, σθ1)

θi2

~ Normal(μθ2, σθ2)

for βf and βb, and as

Φ–1(θi1 / scale)

~ Normal(μθ1, σθ1)

Φ–1(θi2 / scale)

~ Normal(μθ2, σθ2)

for Arew and Apun. Here, Φ–1 is the inverse of the cumulative distribution function of the normal distribution, and scale indicates a scaling factor applied to the parameter to ensure it meets the appropriate parameter bound. For Arew and Apun , scale = 1, resulting in the learning rates being bounded between 0 and 1.

For clarity, we write the centered parameterizations above; however, we implemented the model by using noncentered parameterizations to improve convergence and estimation efficiency. Similarly, to improve sampling from the multivariate normal distributions, we used Cholesky decompositions of the correlation matrices (Haines et al., 2018). Details of the noncentered parameterizations are in the supplement. We sampled the model by using 4 chains, each with 5,000 iterations with the first 1,000 iterations as warmup (Sullivan-Toole et al., 2022). After fitting the model, we checked for convergence of target distributions visually with trace-plots and for each parameter numerically with \(\widehat{R}\) values (Gelman & Rubin, 1992). \(\widehat{R}\) values were all <1.1, indicating that the variance between chains did not outweigh the variance within chains (i.e., convergence). We performed posterior predictive checks by simulating data from the model and visually inspected how well the model fit the data by comparing simulated data with the observed data. Finally, we performed parameter recovery diagnostics which showed adequate recovery of all parameters. Parameter recovery diagnostics are displayed in the supplement.

Because we used a Bayesian approach, we based our inferences off evaluating the posterior distributions of parameters. We assessed mean-level stability by calculating differences between session 1 and session 2 group-level posterior distributions for each parameter. We assessed rank-order stability by examining the posterior distributions of the ρθ12-values from Eq. 8 for each parameter. We computed correlations between the person-level posterior means of reward and punishment learning rates to determine whether these measures could dissociate reward approach and punishment learning. Finally, to evaluate construct validity, we correlated the posterior means of each person-level parameter with scores on each of the self-report measures. As described above, we calculated bootstrapped correlations with bias-corrected and accelerated bootstrapped confidence intervals for each correlation except for the assessment of rank-order stability which was estimated within the joint model.