Adaptive choice relies on valuation of commodities and behaviors in terms of their rewarding properties and cost of acquisition. The ability to monitor electrical and chemical signals within the brain while animals behave and respond to their environment offers a unique opportunity for understanding how the brain assesses value in terms of cost and expected return during decision making processes. Indeed, extensive research monitoring the electrical activity of dopamine (DA) neurons in awake monkeys during Pavlovian conditioning tasks implicates mesolimbic DA transmission in generating teaching signals involved in reward prediction (Schultz, 2010). Specifically, the firing of midbrain DA neurons appears to encode reward prediction errors: a better-than-expected reward (positive prediction error) increases the firing rate, a fully predicted reward elicits no change, and a worse-than-expected reward (negative prediction error) reduces firing rate (Schultz et al., 1997). Moreover, DA neurons respond proportionally to cues that indicate the probability, magnitude, and delay of future rewards, with expected, large, and immediate rewards eliciting the greatest response (Schultz, 2010). Thus, DA neurons appear to encode the subjective value of anticipated rewards in response to a predictive stimulus.
In addition to reward prediction, DA acting in the nucleus accumbens is implicated in assessing the cost of seeking rewards. More specifically, interfering with accumbens DA transmission through pharmacological or surgical manipulations makes rats appear less willing to trade high levels of work (i.e., cost) for reward (Salamone et al., 2009). However, these techniques generally lack the temporal precision to elucidate how DA neurons dynamically signal expected reward magnitude and future costs, both of which influence the subjective valuation of rewards and behavior. Real-time measurements of DA neurotransmission, on the other hand, offer the ability to relate DA transmission to these specific aspects of the decision-making process.
In a recent issue of The Journal of Neuroscience, Wanat et al. (2010) investigated how the DA system encodes rewards associated with escalating costs by monitoring DA concentrations in rat nucleus accumbens with fast-scan cyclic voltammetry while the animals performed instrumental tasks (Wanat et al., 2010). Rats were trained to press a lever to receive a reward on both a fixed ratio (FR)4 and a progressive ratio (PR) schedule. The response cost of the reward was always four lever presses on the FR4 schedule, whereas it gradually increased throughout each session on the PR schedule. For both schedules, the same cue light signaled the availability of reward, which was always a single food pellet. Thus, response cost was the only difference between groups. Nucleus accumbens DA concentrations were assessed during the periods immediately following cue onset and reward delivery.
As expected, rats completed all trials in the low effort FR4 sessions. An analysis of the last, uncompleted (breakpoint) trial in PR sessions showed rats were, on average, willing to complete ∼100 lever presses for a single food pellet. Interestingly, this breakpoint level appeared consistent over several sessions, suggesting stable valuation for the single food pellet reward. Voltammetric results revealed that DA released immediately following cue onset in both FR4 and PR sessions displayed a rapid, but similar, decrease in amplitude over the first two trials and remained at a comparable, stable level for the rest of the session. Thus, the amplitude of cue-evoked DA signals appeared to scale relative to expected reward magnitude, which was the same for all FR and PR trials and was not influenced by escalating effort and delay costs in the PR sessions. In contrast, whereas DA release immediately following reward delivery was negligible throughout low-effort FR sessions, increased DA release was observed following reward after rats endured greater response costs in the PR sessions.
These findings suggest that overcoming escalating costs in PR-instrumental tasks elicited more DA release upon receipt of reward. However, the number of lever presses required and the delay to reward were highly correlated, making it impossible to dissociate the unique effects of effort and delay on DA signaling. Thus, a low-effort yoked control group was used; for these animals, a single lever press led to delivery of a single food pellet, but the duration between lever press and reward delivery was progressively increased to match the delays observed in the PR sessions. Similar to the results in PR sessions, reward-evoked DA release increased with delay to reward delivery. A comparison of the PR results to the low-effort yoked control group revealed a significant effect of trial duration, but not session type, and no significant interaction between session type and trial duration on reward-evoked DA release. This suggests that increasing reward-evoked DA release in PR sessions was a function of time costs and not costs resulting from increased effort.
Cue-evoked DA signals have been interpreted as coding the value of future rewards (Schultz, 2010), which may set a threshold for the maximum cost that should be overcome to obtain rewards (Phillips et al., 2007). In the Wanat et al. (2010) study, cue-evoked phasic DA signals tracked expected reward magnitude, but not response costs associated with effort or delay, despite rats' known preferences for rewards requiring minimal effort and delay (Day et al., 2010; Gan et al., 2010). This finding contrasts with recent work showing cue-evoked accumbens DA release reflected both expected reward magnitude and response cost in rats making cost-benefit analyses (Day et al., 2010). Day et al. (2010), however, used distinct cue lights to signal different reward values, whereas Wanat et al. (2010) used a single cue. Therefore, the single cue predicted reward magnitude alone, and offered no information relevant to response cost. Thus, the cue-evoked DA signal can only be interpreted to signal reward magnitude, which may explain why the cue-evoked signal remained stable for all but the first few trial as the magnitude of reward never changed. These findings imply that the predicted value of the future reward and, in turn, the threshold to work, remained constant over trials, an interpretation supported by stable breakpoints over several sessions in the PR tasks.
More difficult to explain is the finding that increasing delay cost resulted in greater relative DA concentration after animals earned the reward. Because nucleus accumbens DA signaling has been posited to reflect incentive salience attributed to the value of stimuli (Berridge, 2007), it may be that increased DA release to longer delayed rewards reflects the findings that rewards are perceived as more valuable after a greater cost has been endured to obtain them (Alessandri et al., 2008). However, this interpretation does not explain why the DA signal to reward dynamically responded only to increases in delay, not to increases in effort. On the other hand, since prediction error signals are always generated in DA neurons when reward outcomes are unexpected (Schultz et al., 1997), and temporal uncertainty of reward delivery generates larger predictions errors, larger DA signals would be expected following receipt of a delayed reward, as has been previously described (Kobayashi and Schultz, 2008). Due to the nature of the PR task, unless the rat determines the rate at which lever press requirements increase, successive reward retrievals will occur at a certain level of uncertainty. Thus, the most parsimonious explanation appears to be that increased reward-evoked signals were due to reward prediction errors and not increased cost conferred by the delay.
Overall, the results of Wanat et al. (2010) confirm that cue-evoked transient dopamine-release events reflect the magnitude of expected rewards and show prediction error signals when the timing of reward delivery is unknown, as has been described for phasic dopamine cell body responses (Kobayashi and Schultz, 2008). However, it appears that rapid dopamine signals, traditionally implicated in unexpected rewards and reward predictions, may not signal effort associated with predicted reward or reward delivery. This coincides with a report that genetically altered mice lacking the ability to elicit phasic DA signals show similar break point levels in a PR lever-pressing task as control mice, suggesting motivation to work is not signaled by phasic DA signals (Zweifel et al., 2009). It remains possible that the role nucleus accumbens DA plays in modulating effortful behaviors (Salamone et al., 2009) is signaled by alternative mechanisms that are not readily detectable by the measurement techniques used in the current study. Optogenetic techniques that permit the ability to rapidly control and observe genetically specified neural populations (Airan et al., 2007) may be a logical next step for ascribing a causal role to DA in decision making in general, and cost-benefit analyses in particular.
Footnotes
-
Editor's Note: These short, critical reviews of recent papers in the Journal, written exclusively by graduate students or postdoctoral fellows, are intended to summarize the important findings of the paper and provide additional insight and commentary. For more information on the format and purpose of the Journal Club, please see http://www.jneurosci.org/misc/ifa_features.shtml.
- Correspondence should be addressed to Dan P. Covey, School of Biological Sciences, Illinois State University, Campus Box 4120, Normal, Il 61790-4120. dpcovey{at}ilstu.edu