Reward-controlled learning has been extensively studied from a behavioral stand point since the pioneering work of Pavlov, Thorndike and others at the beginning of the XXth century. The existence of brain reward circuits was then demonstrated by the intracranial self-stimulation experiments of Olds and Milner and the role of dopamine (DA) in these circuits was rapidly suspected. However, it is only recently that the neural mechanisms underlying these complex processes have been clarified.
Liste des participants :
Paul Apicella, Christelle Baunez, Jesus Bertran Gonzalez, Jocelyne Caboche, Peter Dayan, Jean-Michel Deniau, Gaetano di Chiara, Kenji Doya, Jean-Antoine Girault (Organisateur), Paul Greengard, Daniel Hervé, Bruce T. Hope, Brian Knutson, Jeanette Kotaleski, Angus Nairn, Takashi Nakano, Pete Redgrave, Miriam Sanchez Matamales, Emmanuel Valjent, Louk J. Vanderschuren, Robert Mark Wightman.
par Jean-Antoine Girault
May 7 to 12, 2007
Résumé en français
L’apprentissage par récompense a été étudié en détail d’un point de vue comportemental à la suite des travaux précurseurs de Pavlov, Thorndike et d’autres au début du XXème siècle. L’existence des circuits cérébraux de la récompense a ensuite été prouvée par les expériences d’autostimulation intracrânienne de Olds et Milner et le rôle de la dopamine (DA) dans ces circuits a été rapidement soupçonné. Cependant, ce n’est que récemment que les mécanismes neuronaux responsables de ces processus complexes ont été mis en évidence. Les avancées récentes sont telles qu’on peut raisonnablement espérer que le système de récompense sera le premier processus comportemental et cognitif dont les bases moléculaires et cellulaires seront entièrement élucidées. Les neurones à DA dans la substance noire et l’aire tegmentale ventrale signalent les erreurs de prédiction de récompense : ils déchargent en réponse à des récompenses inattendues ou, à la suite d’un apprentissage associatif, en réponse aux stimuli conditionnés associés à la récompense primaire, plutôt qu’à la récompense elle-même. A l’inverse, leur décharge est inhibée par l’absence d’une récompense attendue. Ainsi, la libération de DA apparaît comme un signal sophistiqué permettant de contrôler l’apprentissage en fonction des événements associés à une récompense. Schématiquement, les hypothèses actuelles supposent que la DA libérée en réponse à des récompenses inattendues (ou des stimuli associés à une récompense) induit une plasticité synaptique dans les circuits du cerveau antérieur et, ainsi, renforce les associations entre un comportement particulier et un contexte environnemental spécifique.
Les travaux expérimentaux et théoriques consacrés à l’élucidation du rôle de la DA, ainsi que leurs implications pour le comportement et la pathologie humaine et animale ont été discutés pendant le congrès par certains des meilleurs spécialistes de ce domaine. Les questions débattues ont porté en particulier sur le contrôle de l’activité des neurones à DA et les mécanismes par lesquels la DA exerce ses effets sur les neurones post-synaptiques.
Compte-rendu (en anglais)
Reward-controlled learning has been extensively studied from a behavioral stand point since the pioneering work of Pavlov, Thorndike and others at the beginning of the XXth century. The existence of brain reward circuits was then demonstrated by the intracranial self-stimulation experiments of Olds and Milner and the role of dopamine (DA) in these circuits was rapidly suspected. However, it is only recently that the neural mechanisms underlying these complex processes have been clarified. Recent progress has been such that reward systems may arguably be the first behavioral and cognitive process for which the molecular and cellular basis will be worked out. DA neurons in the substantia nigra (SN) and the ventral tegmental area (VTA) code for errors in reward prediction: they fire in response to unexpected rewards or, following associative learning, in response to the conditioned stimulus associated to the primary reward rather than the reward itself. Conversely, the absence of an expected reward inhibits their firing. Thus, the release of DA appears to be a sophisticated signal for controlling learning in relation to reward events. Schematically, current hypotheses imply that DA released in response to unexpected rewards (or reward-conditioned stimuli) facilitates synaptic plasticity in forebrain circuits and, thus, reinforces associations between a particular behavior and a specific environmental context.
Experimental and theoretical work devoted to the elucidation of the role of DA, as well as their implications for animal and human behavior and pathology have been discussed during the meeting by some of the best specialists in this area. Questions that were debated include the control of DA neurons activity and the mechanisms by which DA exerts its effects on post-synaptic neurons.
Gaetano Di Chiara (University of Cagliari) compared the effects of conventional and addictive rewards on DA release in the nucleus accumbens shell, a striatal structure particularly implicated in drug addiction. Di Chiara provided evidence that drugs of abuse, in contrast to natural rewards, bypass and usurp the adaptive mechanisms (i.e. habituation) that constrain the responsiveness of dopamine in the nucleus accumbens shell, thus permitting the maintenance of behavioral consequences throughout time. Mark Wightman (University of North Carolina), using combined electrochemical detection and iontophoresis in behaving rats, provided information about the rapid timing of DA release in reward situations. The precise timing of DA release with respect to rewarding stimuli was hotly debated during the conference. Peter Redgrave (University of Sheffield) argued against the ability of midbrain DA neurons to code for reward prediction errors. He argued that the activation of DA neurons following presentation of a conditioned stimulus, is too rapid for the identification (and coding) of unpredicted reward. Jean-Michel Deniau (Inserm, Collège de Frace, Paris) presented the anatomy and physiology of striatal circuitry. He proposed that short-term intrinsic plasticity could participate in inducing long-term modifications in cortico-striatal synapses. Paul Apicella (CNRS, Provence University, Marseille) showed the importance of tonically active neurons (TANs, cholinergic interneurons that are in low proportion in the striatum) in the transmission of signals for reward-related stimulus. His studies, in monkeys, revealed that other factors beyond motivation, such as stimulus detection, movement control and context recognition, can affect the responsiveness of these neurons. He concluded that local neuronal circuits contribute to computations used in learning and action functions of the striatum. Brian Knutson (Stanford University) presented how functional magnetic resonance imaging (FMRI) performed in humans demonstrates that reward anticipation can increase BOLD (blood oxygen level dependent) signal in the nucleus accumbens, providing a measurable value equivalent to DA release in patients.
Molecular mechanisms underlying DA-controlled changes in synaptic efficacy, protein expression, and synaptic morphology were extensively discussed. Medium size spiny GABAergic neurons (MSNs), which form the largest population of striatal neurons, are major targets of DA innervation. In the last decades, DA-induced signaling pathways have been largely studied in these neurons. Denis Hervé (Inserm, Pierre & Marie Curie University, Institut du Fer à Moulin, Paris) showed that Gαolf is a G protein subunit essential for D1R signaling in the striatum. He also provided evidence that the amount of Gαolf is a limiting factor in striatal cells, determining the intensity of biochemical and behavioral responses linked to D1R stimulation. In animal models and Parkinsonian patients, increased levels of Gαolf can account for the hypersensitivity of D1R signaling and promote development of abnormal involuntary movements (dyskinesia), a complication frequently reencountered after long L-DOPA treatment. Protein phosphorylation cascades downstream from D1R have been extensively characterized by Paul Greengard (The Rockefeller University, New York) and his colleagues. MSNs express high levels of DARPP-32 (dopamine- and cAMP-regulated phosphoprotein Mr=32, 000), an inhibitor of protein phosphatase-1 which plays a critical role in DA signaling. Through the regulation of its phosphorylation sites by neurotransmitters, and its ability to modulate the activity of PP1 and PKA, DARPP-32 plays a key role in integrating a variety of biochemical, electrophysiological, and behavioral responses. DARPP-32-dependent signaling mediates the actions of multiple drugs of abuse including cocaine, amphetamine, nicotine and caffeine. Jocelyne Caboche (CNRS, Pierre & Marie Curie University, Paris) described the role of the MAPK/ERK signaling pathway in drug-induced gene regulation after DA and NMDA receptors stimulation. She presented the role of the transcription factor Elk-1 in long term neuronal adaptations induced by cocaine. In addition, she defined the role of the mitogen- and stress-activated protein kinase 1 (MSK1) as a major nuclear striatal kinase, downstream from ERK, responsible for the phosphorylation of transcription factor CREB and histone H3. Jean-Antoine Girault (Inserm, Pierre & Marie Curie University, Institut du Fer à Moulin, Paris) showed that ERK pathway activation requires the stimulation of both D1R and glutamate NMDA receptors. This mechanism of biochemical “coincidence detection” of environmental cues (glutamate) and positive reward prediction error signals (DA) is critical for long term behavioural effects of drugs of abuse. He also described a mechanism by which DA signals are transduced to the nucleus, showing that after D1R stimulation, DARPP-32 translocates to the nucleus where it is essential for histone H3 phosphorylation, an important step in chromatin remodeling, presumably by regulating the activity of nuclear PP1. Emmanuel Valjent (Inserm, Pierre & Marie Curie University, Institut du Fer à Moulin, Paris) showed that administration of various drugs of abuse activate ERK in a subset of brain regions involved in addiction. He also discussed how the endocannabinoid system modulates cocaine-induced ERK activation in the striatum and contributes to its long-term behavioral effects. In the nucleus, ERK directly or indirectly phosphorylates several transcription factors and thereby induces the expression of a number of immediate early genes (IEGs), such as zif268, cfos or Nurr77 and many others. Fos immunohistochemistry and c-fos in situ hybridization have been used to assess sensitization-relatedchanges of drug-induced neuronal activation. Bruce Hope (NIDA, Baltimore) studied the role of context (home cage vs novel environment) in the relationship of amphetamine-induced psychomotor activity and Fos expression in nucleus accumbens and caudate-putamen. Long-term exposure to cocaine or amphetamine has been reported to increase the number of dendritic branch points and spines of MSNs in the nucleus accumbens. Angus C. Nairn (Yale Univestity, New Haven) showed how cocaine induced structural alterations of dendritic spines in subpopulations of accumbal MSNs that express either D1 or D2 receptors. His results indicated that, although increased spine density initially occurs in both populations, the altered spine density is stable only in D1R-containing neurons. In addition, he described the mechanisms by which Cdk5, cAMP and other signalling pathways regulate the F-actin cytoskeleton to regulate the spine morphogenesis and synaptic function in developing and adult brains.
A third set of communications considered the behavioral analysis of animal models that mimicked different aspects of drug addiction. Emmanuel Valjent linked intracellular signaling pathways induced by psychostimulants to behavioral responses, showing how their interruption at various levels had important implications on behavior. Louk J. Vanderschuren (Netherlands Institute for Neuroscience, Amsterdam) investigated how altered reward-controlled learning caused by drugs of abuse has pathological consequences, using sophisticated behavioral paradigms. He concluded that drug-associated conditioned stimuli and incentive sensitization play an important role in the early phases of addiction. Furthermore, repeated pairing of the subjective effects of drugs with environmental stimuli (drug paraphernalia, contextual and social cues) causes these conditioned stimuli to gain control over behavior. All the above-mentioned speakers considered the dorsal or ventral parts of the striatum as the most relevant structures for reward-controlled learning and motivation. However, Christelle Baunez (CNRS, Aix-Marseille University) introduced another important nucleus among the basal ganglia also involved in motivational functions: the subthalamic nucleus (STN). She showed that STN inactivation induced motivational exacerbation, as rats with STN disruption showed enhanced motivation in visual attention and alcohol preference tests. Other studies in rats with bilateral STN lesions showed an increased motivation for food and a decreased motivation for cocaine.
Several conferences presented mathematical and computational models that combined the incentive properties of DA release and its capacity to code reward prediction errors. Peter Dayan (University College London) focused his talk on reinforcement learning and unsupervised learning, addressing how animals choose appropriate actions in front of rewards and punishments, and how they form neural representations of the world. He described how serotonin and dopamine systems may act in opposition in reward learning. Kenji Doya (ATR Computational Neuroscience Laboratories, Kyoto) combined robotics with electrophysiological recordings. Recent advances in machine learning and artificial neural networks have made it possible to build robots and virtual agents that can learn a variety of behaviors. Based on a large body of neurobiological data and modelling, he presented a computational theory of decisions under the control of delayed rewards. Doya proposed that serotonin controls the time scale of reward prediction by regulating neural activity in the basal ganglia. Other conferences addressed how computational modeling can also help in understanding post-synaptic signaling pathways initiated by DA. Jeanette H. Kotaleski (Karolinska Institute, Stockholm) presented a model of biochemical signaling pathways in striatal cells important for reinforcement learning. She studied the effects of transient glutamate and DA stimuli on signaling molecules and phosphorylation of DARPP-32. Her study showed that when brief rewards and contextual signals are paired, a stronger response in the intracellular signaling occurs. Furthermore, the model predicted that the biochemical responses are different after brief stimulation compared with prolonged stimulation. Takashi Nakano (ATR Computational Neuroscience Laboratories, Kyoto) developed a kinetic model of the molecular cascade implicated in striatal synaptic potentiation and depression. Based on previous biochemical data from the literature, he showed how PKA signal amplification by DARPP-32 is important for calcium-dependent plasticity. Since it is very difficult to visualize the dynamics of intracellular signaling pathways, modeling could be very useful to formulate hypothesis about critical steps in those pathways.
Altogether, this symposium allowed open communication between scientists from various parts of the world, who study the same questions with very different approaches and intellectual background. It contributed to a wider knowledge of the problems, experimental results and theoretical models. Participants unanimously praised the unique scope of the conference which provided them with the opportunity to become acquainted with very different approaches and concepts and to exchange exciting information and ideas about the reward systems. This multidisciplinary approach provided a chance to better understand the basis of striatal plasticity and how it is altered by massive changes in DA transmission in Parkinson disease or drug abuse.