Zhou Liufang: How can humans achieve natural language listening in complex situations?

Time:0905,2023View:12

   Despite the rapid progress of artificial intelligence in natural language processing today, it is still difficult to listen to the speech of a specific target speaker (such as restaurants, bars, etc.) from a noisy background with multiple speakers; But for humans, this matter is easy. This is also the classic 'cocktail party puzzle'. Undoubtedly, one of the most prominent features in human cognition is the ability to autonomously select, control, and interpret the information we perceive, and form a coherent and unified subjective conscious experience. So, why can humans achieve selective listening, tracking, and understanding of specific target language in complex situations with multiple speakers?

   This study innovated its research methods by using a highly ecologically valid storytelling task and combining it with the classic psychological experimental paradigm of binaural listening. Participants were required to continuously listen to one of two competing long segments of natural speech input, in order to simulate as much as possible the long-term, rich content, and cognitive effort required for natural speech listening in complex real-life scenarios. Based on inter/intra subject correlation techniques using functional magnetic resonance imaging data, research has found that there is a separation in response consistency in the temporal, parietal, and frontal brain regions when the listening target changes. Specifically, regardless of the listening target, the bilateral temporal cortex (including SomMotB_Aud, TempPar) and prefrontal cortex (including DefultB0PFCl, ContA_PFCl) maintain high synchronicity of activation between/within participants, indicating that these brain regions are not sensitive to the listening target. In contrast, the posterior parietal cortex (including defaultA_pCunPCC and ContC_pCu) has high target sensitivity and only exhibits high inter/intra subject activation synchronicity under the same listening target conditions. Subsequently, we conducted cluster analysis based on functional connectivity patterns between voxels. The data-driven results found that these brain distinctions with different target sensitivities belong to two independent functional neural networks. Finally, the meta-analysis revealed that the different brain map results found in the relevant analysis had distinct psychological and cognitive functional characteristics. The brain map results that are not sensitive to listening targets are closely related to language processing related processes such as listening, language, and sentences, while the brain map results that are sensitive to listening targets are related to self-reference functions such as default patterns, theory of mind, and autobiographical memory.

  In summary, this study strongly reveals through three perspectives that natural speech listening in complex situations cannot be achieved without two brain network subsystems that are structurally and functionally separate. This study utilizes novel research methods with high ecological validity to further reveal the brain mechanisms of human speech processing intelligence in complex contexts from the perspective of brain networks. It may provide theoretical inspiration for applications such as artificial intelligence speech recognition and human-machine collaborative intelligence systems in the future.

Figure 1 Schematic diagram of experimental process and intra/inter subject correlation analysis method

Figure 2 Meta analysis results



Zhou, L. F.#, Zhao, D.#, Cui, X., Guo, B., Zhu, F., Feng, C., Wang, J., & Meng, M*. (2022). Separate neural subsystems support goal-directed speech listening. NeuroImage263, 119613.




HONGKOU CAMPUS
550 Dalian Road (W), Shanghai 200083, China
SONGJIANG CAMPUS
1550 Wenxiang Road, Shanghai 201620, China