Researchers at the University of California, Berkeley and the University of California, San Francisco, have developed a brain-computer interface system that allows severely paralyzed people to regain their natural ability to speak. This innovation, which solves a long-standing problem in the field of speech neuroprostheses and is detailed in a study published in the journal Nature Neuroscience, represents a major step forward in providing instant communication to people who have lost the ability to speak.
The research team used advances in artificial intelligence to solve the latency problem (i.e., the delay between when people speak and when they make a sound). Their streaming system can decode neural signals into audible speech in near real-time.
"Our streaming approach brings the same fast speech decoding capabilities to neuroprostheses as devices like Alexa and Siri," explains Gopala Anumanchipalli, co-principal investigator and assistant professor at UC Berkeley. "Using a similar algorithm, we found that we could decode neural data and, for the first time, achieve near-simultaneous voice streaming. The result is more natural and fluent speech synthesis. ”
This technology holds great promise for improving the lives of people with conditions such as ALS or paralysis caused by stroke. "It's exciting that the latest advances in AI are significantly accelerating the real-world application of BCI in the near future," said Edward Chang, a neurosurgeon at the University of California, San Francisco, and senior co-principal investigator of the study.
The system works by collecting neural data from the motor cortex, the part of the brain responsible for controlling speech production, and then using artificial intelligence to decode this activity into spoken language. The researchers tested their approach on Ann, a 18-year-old woman who has been unable to speak since she suffered a stroke 0 years ago. Ann participated in a clinical trial where electrodes implanted on the surface of her brain recorded neural activity as she tried to silently say sentences displayed on the screen. These signals were then decoded into audible speech using an AI model trained on her pre-injury voice.
"We're essentially intercepting signals that turn ideas into expression," explains Cheol Jun Cho, a Ph.D. student at the University of California, Berkeley, and co-lead author of the study. "So what we decode is after the thought happens — after we decide what to say and how to move our vocal tract muscles." This approach allowed the researchers to map Ann's neural activity onto the target sentence without her vocalizing.
One of the key breakthroughs is the ability to achieve near-real-time speech synthesis. The previous BCI system had significant latency – it took up to eight seconds to decode a sentence – but this new approach significantly reduces latency. "We can see that in a second, we get the first sound relative to that intent signal," Anumanchipalli noted.
The system also demonstrated continuous decoding, allowing Ann to "speak" without interference.
Despite its speed, the system maintains a high accuracy rate in decoding speech. To test its adaptability, the researchers evaluated whether it could synthesize words outside of the training dataset.
They used rare words from the NATO phonetic alphabet, such as "Alpha" and "Bravo," confirming that their model could be generalized beyond familiar vocabulary. "We found that our model did a good job of this, which suggests that it is really learning the constituent elements of sound or speech," Anumanchipalli said.
Ann herself has noticed a huge difference between this new streaming method and the earlier text-to-speech methods used in previous studies. According to Anumanchipalli, she believes that hearing her own voice almost instantaneously enhances her sense of immersion, which is a key step in making BCI feel more natural.
The researchers also explored how their system works with different brain sensing technologies, including microelectrode arrays (MEAs) that penetrate brain tissue and non-invasive surface electromyography (sEMG) sensors that detect facial muscle activity. This versatility suggests that the system has a wider range of potential applications across a variety of BCI platforms.
The team is currently working on further enhancing and optimizing their technology. One of the areas of ongoing research is the enhancement of expressive abilities by incorporating paralinguistic features such as tone, pitch, and loudness into synthesized speech. "This is a long-standing problem even in the field of traditional audio synthesis," said Kaylo Littlejohn, another co-lead author and a PhD student at the University of California, Berkeley. "It will bridge the gap with complete naturalism."
Although still experimental, this breakthrough raises hope that with continued investment and development, BCI capable of restoring fluency in speech could be widely adopted within the next decade.
The project has received funding from organizations such as the National Institute for the Deaf and Other Communication Disorders (NIDCD), the Japan Science and Technology Agency's Moonshot Program, and several private foundations.
"This proof-of-concept framework is a major breakthrough," Cho said. We are optimistic that now we can make progress at all levels. ”