2002 Special issueDopamine: generalization and bonuses
Introduction
Much evidence, reviewed by Schultz (1998), suggests that dopamine (DA) cells in the primate midbrain play an important role in reward and action learning. Electrophysiological studies in both instrumental (Schultz, 1992, Schultz, 1998) and classical (Waelti, Dickinson, & Schultz, 2001) conditioning tasks support a theory that DA cells signal a global prediction error for summed future reward in appetitive conditioning tasks (Montague et al., 1996, Schultz et al., 1997), in the form of a temporal difference (TD) prediction error term. One use of this term is training the predictions themselves, a standard interpretation for the preparatory aspects of classical conditioning; another is finding the actions that maximize reward, as in a two-factor learning theory for the interaction of classical and instrumental conditioning. Storage of the predictions involves at least the basolateral nuclei of the amygdala (Hatfield et al., 1996, Holland and Gallagher, 1999, Whitelaw et al., 1996) and the orbitofrontal cortex (Gallagher et al., 1999, O'Doherty et al., 2001, Rolls, 2000, Schoenbaum et al., 1998, Schoenbaum et al., 1999, Schultz et al., 2000, Tremblay and Schultz, 2000a, Tremblay and Schultz, 2000b). The neural substrate for the dopaminergic control over action is rather less clear (Dayan, 2000, Dickinson and Balleine, 2001, Houk et al., 1995, Montague et al., 1996).
The computational role of dopamine in reward learning is controversial for various reasons (Gray et al., 1997, Ikemoto and Panksepp, 1999, Redgrave et al., 1999). First, stimuli that are not associated with reward prediction are known to activate the dopamine system in a non-trivial manner, including stimuli that are novel and salient, or that physically resemble other stimuli that do predict reward (Schultz, 1998). In both cases, an important aspect of the dopamine response is that it sometimes consists of a short-term increase above baseline followed by a short-term decrease below baseline. Second, dopamine release is associated with a set of motor effects, such as species- and stimulus-specific approach behaviors, that seem either irrelevant or detrimental to the delivery of reward. We call these motor effects mechanistic because of their apparent independence from prediction or action.
In this paper (see also Suri and Schultz, 1999, Suri, 2002), we study various of these apparently anomalous activations of dopamine cells. We interpret the short term increase and decrease in the light of generalization as an example of partial information—the response is exactly what would be expected where the animal to be initially incompletely certain as to whether or not the presented stimulus was the one associated with food. We interpret the short-term effects after new stimuli as suggesting that the DA system multiplexes information about bonuses on top of information about rewards. Bonuses are fictitious quantities added to rewards (Dayan and Sejnowski, 1996, Sutton, 1990) or values (Ng, Harada, & Russell, 1999) to ensure appropriate exploration in new or changing environments.
In Section 2, we describe the TD model of dopamine activity. In Section 3 we discuss generalization; in Section 4 we discuss novelty responses and bonuses.
Section snippets
Temporal difference and dopamine activity
Fig. 1 shows three aspects of the activity of dopamine cells, together with the associated TD model. The electrophysiological data in Fig. 1(A) and (B) are based on a set of reaction-time operant conditioning trials, in which monkeys are learning the relationship between an auditory conditioned stimulus (the CS) and the delivery of a juice reward (the unconditioned stimulus or US). The monkeys had to keep their hands on a resting key until the sound was played, and then they had to depress a
Generalization and uncertainty
Fig. 3 shows two aspects of the behavior of dopamine cells that are not obviously in accord with the temporal difference model. These come from two related tasks (Schultz & Romo, 1990) in which there are two boxes in front of a monkey, one of which always contains food (door+) and one of which never contains food (door−). On a trial, the monkey keeps its hand on a resting key until one of the doors opens (usually accompanied by both visual and auditory cues). If door+ opens, the monkey has to
Novelty responses
Another main difference between the temporal difference model of the activity of dopamine cells and their actual behavior has to do with novelty. Salient, novel, stimuli are reported to activate dopamine cells for between a few and many trials. One example of this may be the small response at the time of the stimulus in the top line of Fig. 1(A). Here, there is a slight increase in the response locked to the stimulus, with no subsequent decrement below baseline. In this case, the activity could
Discussion
We have suggested a set of interpretations for the activity of the DA system to complement that of reporting prediction error for reward. First, we considered activating and depressing generalization responses, arguing that they come from short-term ambiguity about the predictive stimuli presented. Second, we considered novelty responses, showing that they are exactly what would be expected where the dopamine cells to be reporting a prediction error for reward in a sophisticated reinforcement
Acknowledgements
Funding is from the NSF and the Gatsby Charitable Foundation. We are very grateful to Nathaniel Daw, Jon Horvitz, Peter Redgrave, Roland Suri, Rich Sutton, and an anonymous reviewer for helpful comments. This paper is based on Kakade and Dayan (2000).
References (67)
- et al.
Psychobiology of novelty seeking and drug seeking behavior
Behavioural Brain Research
(1996) - et al.
Cognition and control in schizophrenia: A computational model of dopamine and prefrontal function
Biological Psychiatry
(1999) - et al.
Behavioral considerations suggest an average reward TD model of the dopamine system
Neurocomputing
(2000) - et al.
Toward a neurobiology of temporal cognition: Advances and challenges
Current Opinion in Neurobiology
(1997) - et al.
Neural dynamics of adaptive timing and temporal discrimination during associative learning
Neural Networks
(1989) - et al.
An electrophysiological characterization of ventral tegmental area dopaminergic neurons during differential pavlovian fear conditioning in the awake rabbit
Behavioural Brain Research
(1999) - et al.
Amygdala circuitry in attentional and representational processes
Trends in Cognitive Sciences
(1999) - et al.
Burst activity of ventral tegmental dopamine neurons is elicited by sensory stimuli in the awake cat
Brain Research
(1997) - et al.
Brain Research Reviews
(1999) - et al.
Dopamine D4 receptor gene: Novelty or nonsense?
Neuropsychopharmacology
(1999)