A prediction engine

Judith Copithorne image

Judith Copithorne image

I have just discovered a wonderful source of ideas about the mind, Open MIND (here), a collection of essays and papers edited by Metzinger and Windt. I ran across mention of it in Derek Bownd’s blog (here). The particular paper that Bownd points to is “Embodied Prediction” by Andy Clark.

LibraryClark argues that we look at the mind backwards. The everyday way we view the working of the brain is: the sensory input is used to create a model of the world which prompts a plan of action used to create an action. He argues for the opposite – action forces the nature of sensory input we seek, that sensory input is used to correct an existing model and it is all done by predicting. The mind is a predicting machine; the process is referred to as PP (predictive processing). “Predictive processing plausibly represents the last and most radical step in this retreat from the passive, input-dominated view of the flow of neural processing. According to this emerging class of models, naturally intelligent systems (humans and other animals) do not passively await sensory stimulation. Instead, they are constantly active, trying to predict the streams of sensory stimulation before they arrive.” Rather than the bottom-up flow of sensory information, the theory has a top-down flow of the current model of the world (in effect what the incoming sensory data should look like). All that is feed back upwards is the error corrections where the incoming sensory data is different from what is expected. This seems a faster, more reliable, more efficient system than the one in the more conventional theory. The only effort needed is to deal with the surprises in the incoming data. Prediction errors are the only sensory information that is yet to be explained, the only place where the work of perception is required for most of the time.

Clark doesn’t make much of it, but he has a neat way of understanding attention. Much of our eye movements and posture movements are seen as ways of selecting the nature of the next sensory input. “Action is not so much a response to an input as a neat and efficient way of selecting the next “input”, and thereby driving a rolling cycle.” As the brain seeks certain information (because of uncertainty, the task at hand, or other reasons), it will work harder to solve the error corrections pertaining to that particular information. Action will be driven towards examining the source of that information. Unimportant and small error corrections may be ignored if they are not important to current tasks. This looks like an excellent description of the focus of attention to me.

Conceptually, this implies a striking reversal, in that the driving sensory signal is really just providing corrective feedback on the emerging top-down predictions. As ever-active prediction engines, these kinds of minds are not, fundamentally, in the business of solving puzzles given to them as inputs. Rather, they are in the business of keeping us one step ahead of the game, poised to act and actively eliciting the sensory flows that keep us viable and fulfilled. If this is on track, then just about every aspect of the passive forward-flowing model is false. We are not passive cognitive couch potatoes so much as proactive predictavores, forever trying to stay one step ahead of the incoming waves of sensory stimulation.

The prediction process is also postulated for motor control. We predict the sensory input which will happen during an action and that information flows from top down and error correction controls the accuracy of the movement. The predicted sensory consequences of our actions causes the actions. “The perceptual and motor systems should not be regarded as separate but instead as a single active inference machine that tries to predict its sensory input in all domains: visual, auditory, somatosensory, interoceptive and, in the case of the motor system, proprioceptive. …This erases any fundamental computational line between perception and the control of action. There remains, to be sure, an obvious (and important) difference in direction of fit. Perception here matches neural hypotheses to sensory inputs, and involves “predicting the present”; while action brings unfolding proprioceptive inputs into line with neural predictions. …Perception and action here follow the same basic logic and are implemented using the same computational strategy. In each case, the systemic imperative remains the same: the reduction of ongoing prediction error.

This theory is comfortable when I think of conversational language. Unlike much of perception and control of movement, language is conducted more in the light of conscious awareness. It is (almost) possible to have a feel of a prediction of what is going to be said when listening and to only have work to do in understanding when there is a surprise mismatch between the expected and the heard word. And when talking, it is without much effort until your tongue makes a slip and has to be corrected.

I am looking forward to browsing through openMIND now that I know it exists.


Roots of communication

Judith Copithorne image

Judith Copithorne image

20 or so years ago I took an interest in non-verbal communication and how it interacted with speech. A number of ideas became very clear in my thoughts: we communicate with our whole bodies whether we want to or even realize what we are doing; the gestures, facial expressions, sounds and postures that we use are evolutionarily very old; and, if we try to consciously plan our non-verbal communication, we are likely to send confusing and ambiguous signals. Communication in language only, stripped of its non-verbal patterns, has to change from the rules of verbal language to the rules of written language or it can be unintelligible. We rely on the non-verbal clues to know in what frame to interpret the words and rely on the cadence of speech to organize the connection of words and thoughts.

A recent post by M. Graziano in Aeon (here) is very interesting and worth a read. Here I am just pointing to the central idea of Graziano’s revelation. There is much more of interest in the original post.

Most vertebrates have a personal space which they monitor and protect. If they suspect an invasion of their space, they automatically react. Graziano gives a description of this reaction in primates, which protects vulnerable areas such as eyes, face, neck, and abdomen: “… he squints. His upper lip pulls up, bunching the cheeks towards the eyes. The head pulls down, the shoulders lift, the torso curves, the arms pull across the abdomen or face. A swipe near the eyes or a bonk on the nose might even produce tears, another component of a classical defensive reaction. His grunts begin to be tinged with distress calls.” This is not really communication on the part of the primate whose space has been invaded but a defense of himself that is innate and automatic. However, an observing primate can interpret the reaction as meaning that the defending primate actually, honestly feels threatened. Slowly, through evolution, this reaction, and parts of it, can become signals and symbols useful in communication.

In Graziano’s theory, smiles are a mild version of the facial defense of the eyes. It simply communicates friendliness and a lack of aggression by mimicking defense as opposed to offense. An exchange of smiles establishes a mutual non-aggression state. Even though we might think that showing teeth is aggressive, it is part of protecting the eyes. That can be seen more clearly in genuine smiles rather than polite or faked smiles, the ones which start with squinting around the eyes rather than the lifting of the lip.

Play is the situation giving rise to laughter in Graziano’s thinking. Play is governed in mammals by signals that keep the action from getting dangerous even if it looks it, like the safe words in S&M. These signals are universal enough that the young from different species can rough and tumble together without mishap. Laughter mimics the defense of personal space with a facial expression similar to a smile along with a stereotypical noise somewhat like an alarm cry. When it is intense there is a protection of the abdomen by bending forward and putting the arms across the stomach. A laugh seems to indicate that the defenses of the personal space have been breached. Someone has reached in and tickled protected parts of the body, or something, a joke perhaps, has surprised you. You are allowing the game to invade your space because you are enjoying it and the laugh communicates that.

Then there is crying. Now the communication is “enough” because I am hurt. If it is intense there is a sobbing cry and lots of tears, the hands protect the eyes and a defensive posture forms a little ball. (Laughter can even end up as crying if it is strong enough.) Tears are asking for relief and comfort – and they usually get it, as all children seem to know.

It is somewhat amazing that so much communication might be made out of one innate reaction through the process of evolution. Being able to effectively communicate is a powerful selective force. “And why should so many of our social signals have emerged from something as seemingly unpromising as defensive movements? This is an easy one. Those movements leak information about your inner state. They are highly visible to others and you can rarely suppress them safely. In short, they tattletale about you. Evolution favours animals that can read and react to those signs, and it favours animals that can manipulate those signs to influence whoever is watching. We have stumbled on the defining ambiguity of human emotional life: we are always caught between authenticity and fakery, always floating in the grey area between involuntary outburst and expedient pretence.

Another look at consciousness

Judith Copithorne image

Judith Copithorne image

There is an interesting new paper with a proposed model of consciousness (Michael H. Herzog, Thomas Kammer, Frank Scharnowski. Time Slices: What Is the Duration of a Percept? PLOS Biology, 2016; 14 (4): e1002433 DOI: 10.1371/journal.pbio.1002433). It reviews various theories and experiments in the literature on the subject. Their model is similar to how I have viewed consciousness for a few years, but with important and interesting differences.

They view consciousness as non-continuous, like the frames of a movie, which has seemed the only way to look at consciousness that fits the knowledge that we have of what appear to happen in the brain. They are not dealing with the neurology though and give space to reasons why people have resisted discrete frame and clung to continuous consciousness.

Another aspect of their theory that I like to hear is that the heavy lifting of perception is done unconsciously. The final product of the unconscious processing is a ‘frame’ of consciousness. This fits with the notion that there is not a conscious mind in the sense that we think of a mind. There is only the unconscious mind or simply the mind. Consciousness is a presentation, a moment of experience to remember, a global awareness of a percept.

I have in the past thought of a best-fit-scenario end point of perception, the stable point that would end the iterations of a complex analog computation and be the perception on which consciousness is based. The authors talk of Bayesian statistical computations stopping when they reach an ‘attractor’. This seems the same basic idea but more amenable to experimentation and modeling.

During the unconscious processing period, the brain collects information to solve the ill-posed problems of vision, for example, using Bayesian priors. The percept is the best explanation in accordance with the priors given the input. … One important question is how the brain “knows” when unconscious processing is complete and can be rendered conscious. We speculate that percepts occur when processing has converged to an attractor state. One possibility is that hitting an attractor state leads to a signal that renders the content conscious, similarly to, for example, broadcasting in the global workspace theory. … Related questions are the role of cognition, volition, and attention in these processes. We speculate that these can strongly bias unconscious processing towards specific attractor states. For example, when viewing ambiguous figures, a verbal hint or shifting attention can bias observers to perceive either one of the possible interpretations, each corresponding to a different attractor state.

The most interesting idea (to me) is that the conscious precept is not a snap shot in a series of snap shots but a constructed slab or slice of time in a series of slices. The frames are of short duration but represent slices of time rather than moments. The implication is that we do not, in any sense, have a direct experience of the world, but a highly processed and codified one.

All features become conscious simultaneously, and the percept contains all the feature information derived from the various detectors. Hence, (a green line) is not actually consciously perceived as green during its actual (sensual stimulus) but later when rendered conscious. The same holds true for temporal features. The stimulus is not perceived during the 50 ms when it is presented. The stimulus is even not perceived for a duration of 50 ms. Its duration is just encoded as a “number,” signifying that the duration was 50 ms in the same way that the color is of a specific hue and saturation.

I hope this paper stimulates some ingenious experimentation.


The opposite trap

Judith Copithorne image

Judith Copithorne image

I vaguely remember as a child that one of the ways to learn new words and get some understanding of their meaning was to learn pairs of words that were opposites. White and black, day and night, left and right, and endless pairs were presented. But in making learning easier for children, this model of how words work makes learning harder for adults.

There are ideas that people insist on seeing as opposites – more of one dictates less of the other. They can be far from opposite but it is difficult for people to abandon this relationship. It seems that a mechanism we have for words is making our understanding of reality more difficult. An example is economy and environment. The notion that what is good for the environment has to be bad for the economy and vice versa is not strictly true because there are actions that are good for both and actions that are bad for both, as well as the actions that favour only one. We do not seem to look for the win-win actions and even distrust people who do try.

Another pair is nurture against nature or environment against genetics. These are very simply not opposites, really, they are not even a little bit so. Almost every feature of our bodies is under the overlapping control of our genetics and our environment. They are interwoven factors. And, it is not just our current environment but our environmental history and also that of our parents and sometimes our grandparents that is mixed in with our genetics.

In thinking about our thoughts and actions, opposites just keep being used. We are given a picture of our heads as venues for various parts of our minds to engage in wars and wrestling matches. We can start with an old one: mind versus brain or non-material mental versus material neural dualism. This opposition is almost dead but its ghost walks still. Some people divide themselves at the neck and ask whether the brain controls the body or does the body control the brain – and they appear to actually want a clear-cut answer. There is the opposition we inherited from Freud: a thought process that is conscious and one that is unconscious presented as two opposed minds (or three in the original theory). This separation is still with us, although it has been made more reasonable in the form of system1 and system2 thinking. System2 uses working memory and is therefore registered in consciousness. It is slow, takes effort, is limited in scope and is sequential. System1 does not use working memory and therefore does not register in consciousness. It is fast, automatic, can handle many inputs and is not sequential. These are not separate minds but interlocking processes. We use them both all the time and not in opposition. But they are often presented as opposites.

Recently, there has been added a notion that the hemispheres of the brain can act separately and in opposition. This is nonsense – the two hemispheres complement each other and cooperate in their actions. But people seem to love the idea of one dominating the other and so it does not disappear.

It would be easier to think about many things without the tyranny of some aspects of language, like opposites, that we learn as very young children and have to live with for the rest of our lives. The important danger is not when we name the two ends of a spectrum, but when we name two states as mutually exclusive, they had better actually be so or we will have problems. It is fine to label a spectrum from left-handed to right-handed but if they were opposites then all the levels of ambidextrous handedness would be a problem. The current problem with the rights of LBGT would be less if the difference between women and men was viewed as a complex of a few spectra rather than a single pair of opposites.

Neuroscience and psychology need to avoid repeatedly falling into opposite-traps. It still has too many confusions, errors, things to be discovered, dots to be connected and old baggage to be discarded.

Thanks Judith for the use of your image


Out of the box

I have not been reading science reports as much of late and have not been writing. My mind has wandered to less conventional ideas. I hope you find them entertaining and maybe a little useful.

Because we got stuck years ago with a computer model for thinking about the brain, we may have misjudged the importance of memory. It is seen as a storage unit. Memory has been shown to be a very active thing, but still seen as an active storage thing. We know it is involved with learning and imagining as well as recalling, but thinking functions are seen as just how we may use what is remembered. No matter how people think of the brain or the mind, memory stays over to the side as a separate store. Even though there are many types of memory (implicit, explicit and working for a start) they are still just storage. They are seen as the RAM and hard disks of the mind.

Suppose (just for an exercise) that we had started out putting memory in the role of an operating system when we first started using the computer model to get our bearings on thought. Think of it as a form of Windows rather than a hard disk. Actually, this is not as far-fetched as you may think. There is a system called MUMPS which runs on a computer without any other operating system under it and consists of a single large data storage structure and a computer language to use the data. It was invented in the ’60s and is still used in many medical computer systems because it is very fast, and accurate in that it does not impose format restrictions on the data. I am not supposing that the brain is like MUMPS, far from it; but simply pointing out that there is more than one way to view the role of memory.

So – back to the ‘what if’.

The interesting thing about the brain is its plasticity. The changes are not rare or special but are happening all the time. Whatever the brain does leaves it changed a bit. The greatest producers of change are remembering, learning, imagining, recalling – or anything that involves the memory. Every time one neuron causes another neuron to fire, the synapses between those two neurons are strengthened. Remembering makes changes to the connectivity of the brain or in computer terms it changes the architecture of the hardware.

Connecting separate memories (memory integration) is how we make inferences; chains of inferences lead to decisions. If my memory A is connected to B, and B is connected to C, than C and A can be connected. That is the sort of thing that happens when we think. Recognition is also a memory function. If I say it is greenish, you might think of vegetation or Ireland or toys. If it is upside down, it is not Ireland but other things become more likely. But if I then say that it is furry – well then it is likely to be a sloth or some silly soft toy. Saying that it moves slowly would clinch it. The word green is connected to a great many other words and so is upside down but their intersection is small. It gets tiny when it must overlap with the fur connections.

Memory is waiting to help. When I am someplace doing something with some aim, everything I sense and everything I know about the place and the activity, all the memories that may be useful to me are alerted and stand primed, really to be useful. I would not be aware of all these alerted memories until I use them and even then I might be unaware of them. It is actually extremely difficult (probably impossible) to have memory-free thoughts. Even something like vision is not just stimuli processed into image, it is wrapped in memories to connect one moment with the next, predict what will be next, identify objects and give meaning to the image.

The mechanisms that store memories appear to provide our sense of place, the consecutive order of events, the flow of time and the assigning of cause and effect links. It even involves part of our sense of self. We either store memories that way because that is how we understand the world or we understand the world that way because of how we remember it. These sound like two opposed ideas but really are the same idea if memory is in effect our ‘operating system’.

We can see memory as the medium of our thoughts and the mechanisms for using memories as part of our cognition. But it could be seen as even more fundamental than that. We live in a model of the world and ourselves in that world. We project that model around us. We seem to view the projection through a hole in our heads from a vantage point a couple of inches behind the bridge of the nose. It is not just visual but includes sound and other senses. This model houses our consciousness but also our recollections and our imaginings. It is a sort of universal pattern or framework for consciousness, memory and a fair bit of cognition. It seems possible that this framework and the elements in it may be one of the ways that different parts of the brain can share information. (Like Baar’s global workspace and similar theories.)

But what could be the connection between consciousness and explicit memory? Again we can look at something that is more familiar – a tape recorder. The little head with its gap writes on the tape as the tape passes by it. There is another head very close to the writing head that reads the tape. The tape can be monitored using this head and earphones almost simultaneously with the sounds being recorded, but they are the sounds that have just been recorded being read from the tape. This may be what consciousness is – an awareness of what has just been put in memory. There is something to think about.


Babies show the way

It is January and therefore we see the answers to the Edge Question. This year the question is “What do you consider the most interesting recent (scientific) news? What makes it important?” I have to say that I did not find this year’s crop of short essays as interesting as in previous years – but there were some gems.

For example N.J. Enfield’s ‘Pointing is a Prerequisite for Language’ fits so well with what I think and is expressed so well (here). I have a problem with the idea that language is not primarily about communication but rather is about a way of thinking. I cannot believe that language arose over a short space of time rather than a long evolution (both biological and cultural evolution). And it began as communication not as a proper-ish language. “Infants begin to communicate by pointing at about nine months of age, a year before they can produce even the simplest sentences. Careful experimentation has established that prelinguistic infants can use pointing gestures to ask for things, to help others by pointing things out to them, and to share experiences with others by drawing attention to things that they find interesting and exciting. … With pointing, we do not just look at the same thing, we look at it together. This is a particularly human trick, and it is arguably the thing that ultimately makes social and cultural institutions possible. Being able to point and to comprehend the pointing gestures of others is crucial for the achievement of “shared intentionality,” the ability to build relationships through the sharing of perceptions, beliefs, desires, and goals.”

So this is where to start to understand language – with communication and with gestures, and especially joint attention with another person as in pointing. EB Bolles has a lot of information on this, collected over quite a few years in his blog (here).

Get rid of the magic

I have difficulty with the reaction of many people to the idea that consciousness is a process of the brain. They say it is impossible, consciousness cannot be a physical process. How can that vivid subjective panorama be the product of a physical process? They tend to believe either some variety of dualism – consciousness is not physical but spiritual (or magical), or consciousness is a natural primitive – a sort of state of matter/energy that objects possess more of less of (another sort of magic). Or they fudge the issue by believing it is an emerging aspect of physical processes (kind of physical but arising by magic). I find explanations like these far more difficult than a plain and simple physical process in the brain (with no magic).

My question is really, “what would you expect awareness to be like?” “Have you a better idea of how to do awareness?” It would certainly not be numeric values. It would not be word descriptions. Why not a simulated model of ourselves in the world based on what our sensory organs can provide? Making a model seems a perfectly reasonable brain process with no reason to reject it as impossible. It sounds like what we have. But does it need to be a conscious model? (Chalmers’ idea of philosophical zombies assumes that consciousness is an added extra and not needed for thought.)

But it seems that consciousness is an important aspect of a shared simulation. It is reasonable to suppose that all our senses, our memory, our cognition, our motor plans, our emotional states, all contribute to create a simulation. And it is reasonable to assume that they all are responsive to the simulation, use it to coordinate and integrate all the various things going on in the brain so making our behaviour as appropriate as possible. If a model is going to be created and used by many very different parts and functions of the brain it has to be something like a conscious model – a common format, tokens and language.

There have been a number of good and interesting attempts to explain how consciousness might work as the physical process; and there have been a number of attempts to show that such an explanation is impossible. They pass one another like ships in the night. Agreement is not getting any closer. There is not even the start of a concensus and the reason is that one group will not accept something is a valid explanation if it include the magic and the other group will not accept an explanation that loses the magic. The hard question is all about the magic and not about anything else. The question boils down to how can consciousness be explained scientifically while including the magic? I hope that more and more science throws out the magic and the hard question and gets on with explaining consciousness.

Language in the left hemisphere

Here is the posting mentioned in the last post. A recent paper (Harvey M. Sussman; Why the Left Hemisphere Is Dominant for Speech Production: Connecting the Dots; Biolinguistics Vol 9 Dec 2015), deals with the nature of language processing in the left hemisphere and why it is that in right-handed people with split brains only the left cortex can talk although both sides can listen. There is a lot of interesting information in this paper (especially for someone like me who is left-handed and dyslexic). He has a number of ‘dots’ and he connects them.

Dot 1 is infant babbling. The first language-like sounds babies make are coos and these have a very vowel-like quality. Soon they babble consonant-vowel combinations in repetitions. By noting the asymmetry of the mouth it can be shown that babbling comes from the left hemisphere, non-babbling noises from both, and smiles from the right hemisphere. A speech sound map is being created by the baby and it is formed at the dorsal pathway’s projection in the frontal left articulatory network.

Dot 2 is the primacy of the syllable. Syllables are the unit of prosodic events. A person’s native language syllable constraints are the origin of the types of errors that happen in second language pronunciation. Also syllables are the units of transfer in language play. Early speech sound networks are organized in syllable units (vowel and associated consonants) in the left hemisphere of right-handers.

Dot 3 is the inability for the right hemisphere to talk in split brain people. When language tasks are directed at the right hemisphere the stimulus exposure must be longer (greater than 150 msec) than when directed to the left. The right hemisphere can comprehend language but does not evoke a sound image from seen objects and words although the meaning of the objects and words is understood by that hemisphere. The right hemisphere cannot recognize if two words rhyme from seeing illustations of the words. So the left hemisphere (in right-handers) has the only language neural network with sound images. This network serves as the neural source for generating speech, therefore in a split brain only the left side can speak.

Dot 4 deals with the problems of DAS, Development Apraxia of Speech. I am going to skip this.

Dot 5 is the understanding of speech errors. The ‘slot-segment’ hypothesis is based on analysis of speech errors. Two thirds of errors are the type where phonemes are substituted, omitted, transposed or added. The picture is of a two-tiered neural ‘map’ with syllable slots serially ordered as one tier, and an independent network of consonant sounds in the other tier. The tiers are connected together. The vowel is the heart of the syllable in the nucleus slot. Forms are built around it with consonants (CV, CVC, CCV etc.). Spoonerisms are restricted to consonants exchanging with consonants and vowels exchanging with vowels; and, exchanges occurring between the same syllable positions – first with first, last with last etc.

Dot 6 is Hawkin’s model, “the neo-cortex uses stored memories to produce behaviors.” Motor memories are used sequentially and operate in an auto-associative way. Each memory elicits the next in order (think how hard it is to do things backwards). Motor commands would be produced in a serial order, based on syllables – learned articulatory behaviors linked to sound equivalents.

Dot 7 is experiments that showed representations of sounds in human language at the neural level. For example there is a representation of a generic ‘b’ sound, as well as representations of various actual ‘b’s that differ from one another. This is why we can clearly hear a ‘b’ but have difficulty identifying a ‘b’ when the sound pattern is graphed.

Here is the abstract:

Evidence from seemingly disparate areas of speech/language research is reviewed to form a unified theoretical account for why the left hemisphere is specialized for speech production. Research findings from studies investigating hemispheric lateralization of infant babbling, the primacy of the syllable in phonological structure, rhyming performance in split-brain patients, rhyming ability and phonetic categorization in children diagnosed with developmental apraxia of speech, rules governing exchange errors in spoonerisms, organizational principles of neocortical control of learned motor behaviors, and multi-electrode recordings of human neuronal responses to speech sounds are described and common threads highlighted. It is suggested that the emergence, in developmental neurogenesis, of a hard-wired, syllabically-organized, neural substrate representing the phonemic sound elements of one’s language, particularly the vocalic nucleus, is the crucial factor underlying the left hemisphere’s dominance for speech production.

Language in the right hemisphere

Language in the right hemisphere

I am going to write two posts: this one on the right hemisphere and prosody in language, and a later one on the left hemisphere and motor control of language. Prosody is the fancy word for things like rhythm, tone of voice, stress patterns, speed and pitch. It is not things like individual phonemes, words or syntax. In order to properly understand language, we need both.

A recent paper (Sammler, Grosbras, Anwander, Bestelmeyer, and Belin; Dorsal and Ventral Pathways for Prosody; Current Biology, Volume 25, Issue 23, p3079–3085, 7 December 2015) gives evidence of the anatomy of the auditory system in the right hemisphere that is like that in the left. Of course the two hemispheres collaborate in understanding and producing language but the right side processes the emotional aspects while the left processes the literal meaning.

Here is the abstract:

Our vocal tone—the prosody—contributes a lot to the meaning of speech beyond the actual words. Indeed, the hesitant tone of a “yes” may be more telling than its affirmative lexical meaning. The human brain contains dorsal and ventral processing streams in the left hemisphere that underlie core linguistic abilities such as phonology, syntax, and semantics. Whether or not prosody—a reportedly right-hemispheric faculty—involves analogous processing streams is a matter of debate. Functional connectivity studies on prosody leave no doubt about the existence of such streams, but opinions diverge on whether information travels along dorsal or ventral pathways. Here we show, with a novel paradigm using audio morphing combined with multimodal neuroimaging and brain stimulation, that prosody perception takes dual routes along dorsal and ventral pathways in the right hemisphere. In experiment 1, categorization of speech stimuli that gradually varied in their prosodic pitch contour (between statement and question) involved (1) an auditory ventral pathway along the superior temporal lobe and (2) auditory-motor dorsal pathways connecting posterior temporal and inferior frontal/premotor areas. In experiment 2, inhibitory stimulation of right premotor cortex as a key node of the dorsal stream decreased participants’ performance in prosody categorization, arguing for a motor involvement in prosody perception. These data draw a dual-stream picture of prosodic processing that parallels the established left-hemispheric multi-stream architecture of language, but with relative rightward asymmetry.

The ventral and dorsal pathways are also found in both hemispheres in vision. The ventral is often called the ‘what’ pathway and identifies objects and conscious perception while the dorsal is called the ‘where’ pathway and is involved in spatial location for motor accuracy. The auditory pathways appear to also have the dorsal path going to motor centers and the ventral to perceptual centers. And although they deal with different processing functions the pair of auditory pathways appear in both hemispheres, like the visual ones.


Complexity of conversation

Language is about communication. It can be studied as written sentences, as production of spoken language, or as comprehension of spoken language, but these do not get to the heart of communicating. Language evolved as conversation, each baby learns it in conversation and most of our use of it each day is in conversations. Exchanges, taking turns, is the essence of language. A recent paper by S. Levinson in Trends in Cognitive Sciences, “Turn-taking in Human Communication – Origins and Implications for Language Processing”, looks at the complications of turn-taking.

The world’s languages vary in almost all levels of organization but there is a striking similarity in exchanges – rapid turns of short phrases or clauses within single sound envelopes. There are few long gaps or much overlapping speech during the changes of speaker. Not only is a standard turn-taking universal in human cultures but it is found in all types of primates and it is learned by babies before any language is acquired. It may be the oldest aspect of our language.

turntaking1But it is paradoxical – for the gap between speakers is too short to produce a response to what has been said by the last speaker. In fact, the gap tends to be close to the minimum reflex time. A conversational speaking turn averages 2 seconds (2000ms) and the gap between speakers is about 200ms, but it takes 600ms to prepare the first word (1500ms for a short phrase). So it is clear that production and comprehension must go on at the same time in the same areas of the brain and that comprehension must include a good deal of prediction of how a phrase is going to end. Because comprehension and turntaking2production have been studied separately, it is not clear how this multitasking, if that is what it is, is accomplished. First, the listener has to figure out what sort of utterance the speaker is making – statement, question, command or whatever. Without this the listener does not know what sort of reply is appropriate. The listener then must predict (guess) the rest of the utterance, decide what the response should be and formulate it. Finally the listener must recognize the signal/s of when the end of the utterance will be. The listener can immediately begin to talk as soon as the utterance ends. There is more to learn about how the brain does this and what the effect of turn-taking has on the nature of language.

There are cultural conventions that override turn-taking so that speakers can talk for some time without interruption, and even if they pause from time to time, no one jumps in. Of course, if someone speaks for too long without implicit permission, they will be forcibly interrupted fairly soon, people will drift away or some will start new conversations in sub-groups. That’s communication.

Here is the abstract of – Stephen C. Levinson. Turn-taking in Human Communication – Origins and Implications for Language Processing. Trends in Cognitive Sciences, 2015:

Most language usage is interactive, involving rapid turn-taking. The turn-taking system has a number of striking properties: turns are short and responses are remarkably rapid, but turns are of varying length and often of very complex construction such that the underlying cognitive processing is highly compressed. Although neglected in cognitive science, the system has deep implications for language processing and acquisition that are only now becoming clear. Appearing earlier in ontogeny than linguistic competence, it is also found across all the major primate clades. This suggests a possible phylogenetic continuity, which may provide key insights into language evolution.


The bulk of language usage is conversational, involving rapid exchange of turns. New information about the turn-taking system shows that this transition between speakers is generally more than threefold faster than language encoding. To maintain this pace of switching, participants must predict the content and timing of the incoming turn and begin language encoding as soon as possible, even while still processing the incoming turn. This intensive cognitive processing has been largely ignored by the language sciences because psycholinguistics has studied language production and comprehension separately from dialog.

This fast pace holds across languages, and across modalities as in sign language. It is also evident in early infancy in ‘proto-conversation’ before infants control language. Turn-taking or ‘duetting’ has been observed in many other species and is found across all the major clades of the primate order.