↓ Skip to main content

PLOS

An Imperfect Dopaminergic Error Signal Can Drive Temporal-Difference Learning

Overview of attention for article published in PLoS Computational Biology, May 2011
Altmetric Badge

Mentioned by

twitter
10 X users
googleplus
1 Google+ user
q&a
1 Q&A thread

Citations

dimensions_citation
51 Dimensions

Readers on

mendeley
134 Mendeley
citeulike
4 CiteULike
Title
An Imperfect Dopaminergic Error Signal Can Drive Temporal-Difference Learning
Published in
PLoS Computational Biology, May 2011
DOI 10.1371/journal.pcbi.1001133
Pubmed ID
Authors

Wiebke Potjans, Markus Diesmann, Abigail Morrison

Abstract

An open problem in the field of computational neuroscience is how to link synaptic plasticity to system-level learning. A promising framework in this context is temporal-difference (TD) learning. Experimental evidence that supports the hypothesis that the mammalian brain performs temporal-difference learning includes the resemblance of the phasic activity of the midbrain dopaminergic neurons to the TD error and the discovery that cortico-striatal synaptic plasticity is modulated by dopamine. However, as the phasic dopaminergic signal does not reproduce all the properties of the theoretical TD error, it is unclear whether it is capable of driving behavior adaptation in complex tasks. Here, we present a spiking temporal-difference learning model based on the actor-critic architecture. The model dynamically generates a dopaminergic signal with realistic firing rates and exploits this signal to modulate the plasticity of synapses as a third factor. The predictions of our proposed plasticity dynamics are in good agreement with experimental results with respect to dopamine, pre- and post-synaptic activity. An analytical mapping from the parameters of our proposed plasticity dynamics to those of the classical discrete-time TD algorithm reveals that the biological constraints of the dopaminergic signal entail a modified TD algorithm with self-adapting learning parameters and an adapting offset. We show that the neuronal network is able to learn a task with sparse positive rewards as fast as the corresponding classical discrete-time TD algorithm. However, the performance of the neuronal network is impaired with respect to the traditional algorithm on a task with both positive and negative rewards and breaks down entirely on a task with purely negative rewards. Our model demonstrates that the asymmetry of a realistic dopaminergic signal enables TD learning when learning is driven by positive rewards but not when driven by negative rewards.

X Demographics

X Demographics

The data shown below were collected from the profiles of 10 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 134 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
United Kingdom 7 5%
Germany 5 4%
United States 4 3%
Japan 3 2%
Netherlands 1 <1%
Singapore 1 <1%
Sweden 1 <1%
Switzerland 1 <1%
China 1 <1%
Other 0 0%
Unknown 110 82%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 39 29%
Researcher 33 25%
Student > Master 13 10%
Professor 8 6%
Student > Bachelor 7 5%
Other 21 16%
Unknown 13 10%
Readers by discipline Count As %
Agricultural and Biological Sciences 39 29%
Computer Science 30 22%
Neuroscience 16 12%
Psychology 12 9%
Physics and Astronomy 7 5%
Other 16 12%
Unknown 14 10%