Throughout speech manufacturing, it’s evident that language embeddings (blue) within the IFG peaked earlier than speech embeddings (purple) peaked within the sensorimotor space, adopted by the height of speech encoding within the STG. In distinction, throughout speech comprehension, the height encoding shifted to after the phrase onset, with speech embeddings (purple) within the STG peaking considerably earlier than language encoding (blue) within the IFG.
All in all, our findings recommend that the speech-to-text mannequin embeddings present a cohesive framework for understanding the neural foundation of processing language throughout pure conversations. Surprisingly, whereas Whisper was developed solely for speech recognition, with out contemplating how the mind processes language, we discovered that its inside representations align with neural exercise throughout pure conversations. This alignment was not assured — a unfavourable consequence would have proven little to no correspondence between the embeddings and neural indicators, indicating that the mannequin’s representations didn’t seize the mind’s language processing mechanisms.
A very intriguing idea revealed by the alignment between LLMs and the human mind is the notion of a “gentle hierarchy” in neural processing. Though areas of the mind concerned in language, such because the IFG, are likely to prioritize word-level semantic and syntactic data — as indicated by stronger alignment with language embeddings (blue) — in addition they seize lower-level auditory options, which is clear from the decrease but important alignment with speech embeddings (purple). Conversely, lower-order speech areas such because the STG are likely to prioritize acoustic and phonemic processing — as indicated by stronger alignment with speech embeddings (purple) — in addition they seize word-level data, evident from the decrease but important alignment with language embeddings (blue).