Most people think of speaking as a top-down process and listening as a bottom-up one. So if I say something, the assumption is: I have an idea, it is put into words then commands to muscles, and the sounds of the words come out of my mouth. All is top-down, driven from a high-level intent. And, if I listen: the sound waves enter my ears, the input is processed into words and then into meaning. This describes a bottom-up process based on a low-level input becoming high-level perception. There is an implication that these are serial operations – like one descending a staircase and the other climbing a similar one. But there are experimental results that make the picture more confused.
My grandmother was a great one for finishing other people’s sentences. When all went smoothly, Grandma and her friends would end their sentences in unison. But if the friend hesitated, Grandma didn’t stop and finished the sentence for them. In the stairs metaphor, she didn’t wait for the speech to come up the stairs but started down to meet it. It seems we all do this to varying degrees but usually not aloud.
Dikker et al (citation below) investigated the prediction of language. They made a series of somewhat silly or surreal drawings and had them rated for how predicable a description of them would be. They then had one speaker and 9 listeners in fMRI scanners view each picture followed by the descriptive sentence being uttered. The activity in the posterior superior temporal gyrus, which is associated with language and with prediction of language was measured (it was given the title of attentional gain).
“We here adopt the term “attentional gain” to describe how generating internal forward model /prediction may increase the excitability of neuronal populations associated with predicted representations in language production as well as comprehension. During speech planning, it has been argued that speakers internally simulate articulatory commands, and that highly predictable speech acts increase the attentional gain for their expected perceptual consequences, the neural effects of which persist into the perceptual stage, but also during perception listeners show prediction error responses to unpredicted words, whereas lexical-semantic prediction error appears to play no role in the speaker. The speaker likely produced each sentence exactly as planned/anticipated. Predictability more strongly affects attentional gain in comprehension, not only during anticipation. Thus, as summarized in the figure, our results suggest that both speakers and listeners take predictability into account when generating estimates of upcoming linguistic stimuli. These changes in activation resulting from predictive processing, in turn, impact the extent to which brain activity is correlated between speakers and listeners.” Listeners predict the speaker’s words (if they feel they are predictable) and react if the prediction is wrong.
I have a different peculiarity from my grandmother - I often first know I have said something when I hear it. Many people have this happen to them when they are very upset or angry; they just say things and are surprised that they had no warning, intention, or preparation. I have it happen often in normal conversation and the words I hear are usually ones I like, the sort of thing I wanted to say. This implies that a certain amount of top-down preparation is absent at least from conscious awareness. Can it be that there is some bottom-up processing in speaking?
Lind et al (citation below) have studied to what extent people were actually aware of what they were saying before it was said. They manipulated the sound of the speaker so that they appeared to give different answers than they had actually done. In many cases this substitution was accepted along with the implications of the meaning.
Abstract: Speech is usually assumed to start with a clearly defined preverbal message, which provides a benchmark for self-monitoring and a robust sense of agency for one’s utterances. However, an alternative hypothesis states that speakers often have no detailed preview of what they are about to say, and that they instead use auditory feedback to infer the meaning of their words. In the experiment reported here, participants performed a Stroop color-naming task while we covertly manipulated their auditory feedback in real time so that they said one thing but heard themselves saying something else. Under ideal timing conditions, two thirds of these semantic exchanges went undetected by the participants, and in 85% of all nondetected exchanges, the inserted words were experienced as self-produced. These findings indicate that the sense of agency for speech has a strong inferential component, and that auditory feedback of one’s own voice acts as a pathway for semantic monitoring, potentially overriding other feedback loops.
Things are not always as they appear or as simple as we think.
Dikker, S., Silbert, L., Hasson, U., & Zevin, J. (2014). On the Same Wavelength: Predictable Language Enhances Speaker-Listener Brain-to-Brain Synchrony in Posterior Superior Temporal Gyrus Journal of Neuroscience, 34 (18), 6267-6272 DOI: 10.1523/JNEUROSCI.3796-13.2014
Lind, A., Hall, L., Breidegard, B., Balkenius, C., & Johansson, P. (2014). Speakers’ Acceptance of Real-Time Speech Exchange Indicates That We Use Auditory Feedback to Specify the Meaning of What We Say Psychological Science DOI: 10.1177/0956797614529797