Tag Archives: localist

Knowing your grandmother

There is a spectrum of ways in which the brain may hold concepts that range from very localized to very distributed, and there is little agreement of where along that spectrum various concepts are held. At the one end is the ultimate local storage: a single ‘grandmother’ neuron that recognizes your grandmother in matter how she is presented (words, images, sounds, actions etc.). This single cell, if it exists, would literally be the concept of grandmother. At the other end of the spectrum is a completely distributed storage where a concept is a unique pattern of activity across the whole cortex with every cell being involved in the pattern of many concepts. Both of these extremes have problems. Our concept of grandmother does not disappear if part of the cortex is destroyed – no extrmely small area has ever been found that obliterates grandma. On the other hand, groups of cells have been found that are relatively tuned to one concept. When we look at the extreme of distributed storage, there is the problem of localized specialties such as the fusiform face area. And more telling, is the problem of a global pattern being destroyed if multiple concepts are activated at the same time. Each neuron would be involved in a significant fraction of all the concepts and so there would be confusion if a dozen or more concepts were part of a thought/memory/process. As we leave the extremes the local storage becomes a larger group of neurons with more distribution and the distributed storage becomes patterns in smaller groups of neurons.



The idea of localized concepts was thought to be improbable in the 70’s and the grandmother cell became something of a joke. The type of network that computer scientists were creating became the assumed architecture of the brain.



Computer simulations have long used a ‘neural network’ called PDP or parallel distributed processing. This is not a network made of neurons, in spite of the name, but a mathematical network. Put extremely simply there are layers of units; each unit has a value for its level of activity; the units have inputs from other units and outputs to other units; the connections between units can be weighted in their strength. The bottom layer of units takes input from the experimenter and this travels through ‘hidden’ layers to an output layer which reveals the output to the experimenter. Such a setup can learn and compute in various ways that depend of the programs that control the weightings and other parameters. This PDP model has favoured the distributed network idea when modeling actual biological networks. Some researchers have made a PDP network do more than one thing at once (but ironically this entails having more localization in the hidden layer). This might seem a small problem for PDP but PDP does suffer from a limitation that makes rapid one-trial learning difficult. That type of learning is the basis of episodic memory. Because each unit in PDP is involved in many representations – any change in weighting affects most of those representations and so it takes many iterations to get the new representation worked into the system. Rapid one-trial learning in PDP destroys previous learning; this is termed catastrophic interference or the stability-plasticity dilemma. The answer has been that the hippocampus may have a largely local arrangement for its fast one-trial learning but the rest of the cortex can have a dense distribution. But there is a problem. When a fully distributed network tries to represent more than one thing it has problems of ambiguity. This is a real problem because the cortex does not handle one concept at a time – in fact, it handles many concepts at once and often some are novel. There is no way that thought processes could work with this kind of chaos. This can be overcome in PDP networks but again the fix is to move towards local representations.



This is the abstract from a paper to be published soon (citation below).

A key insight from 50 years of neurophysiology is that some neurons in cortex respond to information in a highly selective manner. Why is this? We argue that selective representations support the co-activation of multiple “things” (e.g., words, objects, faces) in short-term memory, whereas nonselective codes are often unsuitable for this purpose. That is, the co-activation of nonselective codes often results in a blend pattern that is ambiguous; the so-called superposition catastrophe. We show that a recurrent parallel distributed processing network trained to code for multiple words at the same time over the same set of units learns localist letter and word codes, and the number of localist codes scales with the level of the superposition. Given that many cortical systems are required to co-activate multiple things in short-term memory, we suggest that the superposition constraint plays a role in explaining the existence of selective codes in cortex.



The result is that our model of the brain moves a good way along the spectrum toward the grandmother cell end. And lately there has been a new methods to study the brain. Epilepsy patients have electrodes placed in their brains to monitor seizures prior to surgery. These patients can volunteer for experiments while waiting for their operations. So it is now possible to record the activity of small groups of neurons in awake functioning human beings. And something very similar to grandmother cells have been found. Some electrodes respond to a particular person – Halle Berry and Jennifer Aniston were two of the first concepts to be found to each have their own local patch of a hundred or so neurons. There was a response in these cells to not just various images, but written names and voices too. It happened with objects as well as people. This home of concepts held as small local groups of neurons has been observed in the area of the hippocampus.



The idea that the brain was one great non-localized network has also suffered from the results of brain scans. Areas of the brain (far from the hippocampus) appear to be specialized. Very specific functions can be lost completely by the destruction of smallish areas of the brain as a result of stroke. The old reasons for rejecting a localized brain organization are disappearing while the arguments against a globally distributed organization are growing. This does not mean that there is no distributed operations or that there are unique single cells for a concept – it just means that we are well to the local end of the spectrum.



Rodrigo Quian Quiroga, Itzhak Fried and Christof Koch wrote a recent piece in the Scientific American (here) in which they look at this question and explain what it means for memory. The whole article is very interesting and worth looking at.

Concept cells link perception to memory; they give an abstract and sparse representation of semantic knowledge—the people, places, objects, all the meaningful concepts that make up our individual worlds. They constitute the building blocks for the memories of facts and events of our lives. Their elegant coding scheme allows our minds to leave aside countless unimportant details and extract meaning that can be used to make new associations and memories. They encode what is critical to retain from our experiences. Concept cells are not quite like the grandmother cells that Lettvin envisioned, but they may be an important physical basis of human cognitive abilities, the hardware components of thought and memory.


Bowers JS, Vankov II, Damian MF, & Davis CJ (2014). Neural Networks Learn Highly Selective Representations in Order to Overcome the Superposition Catastrophe. Psychological review PMID: 24564411