Elsevier

NeuroImage

Volume 31, Issue 2, June 2006, Pages 790-795
NeuroImage

Rapid Communication
Prediction error as a linear function of reward probability is coded in human nucleus accumbens

https://doi.org/10.1016/j.neuroimage.2006.01.001Get rights and content

Abstract

Reward probability has been shown to be coded by dopamine neurons in monkeys. Phasic neuronal activation not only increased linearly with reward probability upon expectation of reward, but also varied monotonically across the range of probabilities upon omission or receipt of rewards, therefore modeling discrepancies between expected and received rewards. Such a discrete coding of prediction error has been suggested to be one of the basic principles of learning. We used functional magnetic resonance imaging (fMRI) to show that the human dopamine system codes reward probability and prediction error in a similar way. We used a simple delayed incentive task with a discrete range of reward probabilities from 0%–100%. Activity in the nucleus accumbens of human subjects strongly resembled the phasic responses found in monkey neurons. First, during the expectation period of the task, the fMRI signal in the human nucleus accumbens (NAc) increased linearly with the probability of the reward. Second, during the outcome phase, activity in the NAc coded the prediction error as a linear function of reward probabilities. Third, we found that the Nac signal was correlated with individual differences in sensation seeking and novelty seeking, indicating a link between individual fMRI activation of the dopamine system in a probabilistic paradigm and personality traits previously suggested to be linked with reward processing. We therefore identify two different covariates that model activity in the Nac: specific properties of a psychological task and individual character traits.

Introduction

Human decision making is guided by predictions about future events and comparisons of those predictions with actual outcomes. These comparisons have been used in various models of learning to identify what should be learned and to speed up learning. Experiments with nonhuman primates (Schultz, 2000) have shown that midbrain dopamine neurons fire according to this prediction error signal, i.e., firing rates were elevated when the animals received a more valuable reward than expected (positive prediction error) and decreased when they fell short of expectations (negative prediction error). Rewards received after the presentation of well-conditioned stimuli that were exactly as valuable as expected did not elicit any changes in firing rates.

The actual value of a reward is influenced by at least three properties: magnitude, probability, and timing (immediate or delayed). As regards magnitude, the fMRI signal in the human ventral striatum has been shown to code the quality (positive or negative) of the prediction error, i.e., a positive prediction error signal following unexpected rewards (Berns et al., 2001) and a negative signal following the omission of expected rewards (Abler et al., 2005, Knutson et al., 2001b). As regards probability, Fiorillo et al. (2003) showed a linear relationship between the prediction error and phasic responses of dopamine neurons in a conditioning paradigm in nonhuman primates. In their study, they held the magnitude and timing of the reward constant and defined prediction error as the discrepancy between the probability at which a reward is expected and the actual outcome. In particular, they found that the signal of dopaminergic neurons was a linear function of the extent to which expectations were not fulfilled, thereby rendering the dopamine signal a much more precise learning signal. Prior fMRI studies have shown that dopaminergic midbrain activation is modulated by reward probabilities (Aron et al., 2004, Dreher et al., 2005), but the design of these studies did not allow for the investigation of whether or not activation in dopaminergic brain areas scales linearly with probability and along the amplitude of prediction errors as suggested by animal research. Finally, one study (McClure et al., 2004) found an influence of the timing of the reward on human Nac activation.

Furthermore, Knutson et al. (2005) studied a combination of two reward properties, the expected value of a reward, defined as the product of its magnitude and probability. O'Doherty et al. (2003) extended the findings for ventral striatal activity from simple prediction error models to more refined models of temporal difference learning, taking reward timing into account. They showed that activity in the ventral striatum not only follows the rules of temporal difference models during learning of reward contingencies but also upon both positive and negative errors in prediction following a violation of expectations once learning has occurred. Moreover, they showed (O'Doherty et al., 2004) that ventral striatal activity in a reward task displays characteristics that fit for the critic component of the actor-critic two-component model (review in (Dayan and Balleine, 2002)). In this reinforcement learning model the critic records and evaluates actual performance, e.g., reward outcome, and passes information to the actor that accordingly changes his policy.

The intention of our study was to show that fMRI activity in human dopaminergic brain areas is a function of reward probability as has been shown for firing rates in monkey midbrain dopamine neurons. Since these neurons project to the Nac and since activation of this area has been shown to represent the prediction error for reward magnitudes, we expected to find activity in this region for reward probabilities as well. We predicted to find that the signal in Nac scales with probability and along the amplitude of prediction error as required by models of reinforcement learning and as shown for monkey dopamine neurons.

We used a simple monetary reward task with five different conditioned stimuli predicting the reward at 0%, 25%, 50%, 75% and 100% probability.

Based upon the findings of Fiorillo et al. (2003), we expected human Nac activation to follow the same characteristics as phasic activation of monkey dopamine neurons. During the expectation of increasing reward probabilities (expectation phase), we assumed to find increasing activation in the Nac following a linear trend.

Activation in the Nac during the outcome phase (upon receipt of reward) was supposed to be higher when predictions were surpassed (positive prediction error) compared to activation when falling short of expectations (negative prediction error). We expected a linear relationship between brain activity and prediction error as coded by reward probability, i.e., we predicted the Nac signal to be highest with the most positive prediction error (subject expects to win at a probability of only 25% and wins), and lowest with the most negative prediction error (subject expects to win at probability of 75% but does not win), and to decrease linearly in between. Zero prediction error was expected with reward probabilities of 100% and 0%.

To further characterize the fMRI activation in dopaminergic brain areas, we applied two questionnaires, the Sensation Seeking Scales, Form V (SSS-V, German Version, Beauducel et al., 2003) and the Novelty Seeking subscale of Cloninger's Temperament and Character Inventory (German Version, Cloninger et al., 1999). Both scales are closely related and describe slightly distinct but associated personality traits (Zuckerman and Cloninger, 1996). While novelty seeking is defined as a tendency towards exploratory activity and intense excitement in response to novelty and active avoidance of monotony or frustration (Cloninger, 1987), sensation seeking involves the pursuit of varied, novel, and intense sensations and experiences and the willingness to take physical, social, legal and financial risks in order to make such experiences (Zuckerman, 1994). Based on behavioral and pharmacological observations in humans and animals, animal lesion experiments, and self-stimulation paradigms in animals, Cloninger proposes a relationship between novelty seeking personality traits and function of the dopaminergic system (Cloninger, 1987). While a possible relationship between dopamine receptor genes and novelty seeking is still a matter of debate (Paterson et al., 1999), two recent investigations with positron emission tomography in humans (Leyton et al., 2002) and animals (Lind et al., 2005) support the notion of a relationship between novelty seeking and dopamine system functioning. Both studies found that greater amphetamine induced dopamine release in ventral striatum is predicted by higher novelty seeking behavior in the studied minipigs and higher scores of novelty seeking personality traits in the human subjects respectively. According to these findings, we expected to find positive correlations between the fMRI activation of the ventral striatum and the Sensation Seeking and Novelty Seeking scales.

Section snippets

Methods

11 healthy male subjects (1 left-handed, aged 22–36) with no history of psychiatric or neurological disease gave written informed consent. The study was approved by the local ethics committee of the University of Ulm. Before scanning, all subjects completed a practice version of the task. After scanning, participants filled the Novelty Seeking and Sensation Seeking questionnaires.

Behavioral responding

Subjects responded correctly in 99% of the trials, i.e., they pressed the correct button within the required time. Reaction times were significantly (P < 0.02 or less respectively) faster in trials with 100% reward expectation (329 ms) than in trials with 75% (349 ms), 50% (363 ms), 25% (370 ms) or 0% (360 ms) and followed a linear trend over probabilities as confirmed by regression analys/es (F = 12.8; P < 0.005).

Subjects reached average median scores on the Novelty Seeking (NS) Scale in

Discussion

To our knowledge, this is the first report to demonstrate that human Nac activity can be described as a discrete function of reward prediction error implemented by reward probability. As hypothesized, we found linearly decreasing activity upon increasing reward probabilities with highest activity in trials with highest positive prediction error and lowest activity in trials with greatest negative prediction error. Analyses confirmed that fMRI activity modeling positive prediction error was

References (27)

  • A. Beauducel et al.

    Psychometrische Eigenschaften und Normen einer deutschsprachigen Fassung der Sensation Seeking-Skalen, Form V

    Diagnostica

    (2003)
  • G.S. Berns et al.

    Predictability modulates human brain response to reward

    J. Neurosci.

    (2001)
  • C.R. Cloninger

    A systematic method for clinical description and classification of personality variants. A proposal

    Arch. Gen. Psychiatry

    (1987)
  • Cited by (312)

    View all citing articles on Scopus
    View full text