Science 6 min read

What was the source of the noise?

By Science Gazette10 February 2022

Neuroscientists have created a computer model that can answer the question just like the human brain.

A computer model that can localize sounds was created by neuroscientists. The model, which is made up of numerous convolutional neural networks, not only performs as well as people at the job, but it also suffers in the same way humans do when the task is made more difficult by adding echoes or additional noises.

Not only is the human brain perfectly trained to distinguish certain sounds, but it can also tell the direction they originated from. The brain can determine the position of a barking dog, wailing fire engine, or oncoming automobile by evaluating variances in sounds that reach the right and left ears.

MIT neuroscientists have now created a computer model that can do that difficult challenge as well. The model, which is made up of many convolutional neural networks, not only executes the job as well as people, but it also has the same challenges.

Josh McDermott, an associate professor of brain and cognitive sciences and a member of MIT’s McGovern Institute for Brain Research, says, “We now have a model that can really locate sounds in the real environment.” “And when we treated the model like a human experimental participant and recreated a broad collection of trials that researchers have tested individuals on in the past, we discovered that the model recapitulates the outcomes that you see in humans over and over again.”

According to McDermott, who is also a member of MIT’s Center for Brains, Minds, and Machines, the new study’s findings also imply that humans’ capacity to detect location is tailored to the particular difficulties of our surroundings.

McDermott is the paper’s senior author, and it was published in Nature Human Behavior today. Andrew Francl, an MIT graduate student, is the paper’s primary author.

Localization modeling

When we hear a train whistle, the sound waves arrive at our right and left ears at somewhat different timings and intensities, depending on the direction the sound is originating from. Parts of the midbrain are dedicated to comparing these minute changes in order to determine where the sound originated, a process called as localization.

Under real-world situations, when the environment creates echoes and several sounds are perceived at once, this job becomes far more complex.

Scientists have long attempted to create computer models that can conduct the same computations that the brain does when locating sounds. These models can function in idealized circumstances with no background noise, but they never work in real-world settings with disturbances and echoes.

Convolutional neural networks were used by the MIT team to construct a more complex model of localisation. This kind of computer modeling has long been used to mimic the human visual system, and McDermott and other scientists have lately started to apply it to audition.

Because convolutional neural networks may be built in a variety of topologies, the MIT researchers utilized a supercomputer to train and test over 1,500 different models to determine the ones that would perform best for localization. This search yielded ten that seemed to be the best candidates for localisation, which the researchers then trained and employed in all following investigations.

The researchers constructed a virtual environment in which they could alter the size of the space as well as the reflection qualities of the room’s walls to train the models. All of the noises that were given to the models came from one of these virtual environments. Human voices, animal noises, technological sounds like automobile engines, and natural sounds like thunder were among the more than 400 training sounds.

The researchers also made certain that the model began with the same data as human ears. The pinna, or outer ear, includes several folds that reflect sound, modifying the frequencies that reach the ear, and these reflections change depending on the source of the sound. Before entering the computer model, the researchers ran each sound through a particular mathematical algorithm to imitate this impact.

Francl explains, “This enables us to offer the model the same type of knowledge that a human would have.”

The researchers put the models to the test in a real-world setting after they had been trained. They put a mannequin in a real room with microphones in its ears and played noises from various directions, then sent the recordings into the models. When asked to locate these noises, the models performed remarkably similarly to people.

“Even though the model was taught in a simulated environment, it was able to locate sounds in the actual world when we tested it,” Francl explains.

Patterns that are similar

The models were then put through a number of tests that scientists have done in the past to evaluate people’ ability to locate objects.

The human brain bases its location judgements on variances in the strength of sound that reaches each ear, in addition to considering the difference in arrival time at the right and left ears. According to previous research, the effectiveness of each of these techniques is dependent on the frequency of the incoming sound. The MIT researchers discovered that the models displayed the same pattern of frequency sensitivity in their latest investigation.

“The model seems to employ frequency-dependent timing and level changes between the two ears in the same manner as individuals do,” McDermott adds.

The researchers also demonstrated that when they increased the difficulty of localization tasks by playing numerous sound sources at the same time, the computer models’ performance deteriorated in a manner that closely resembled human failure patterns under similar conditions.

“You get a precise pattern of loss in humans’ capacity to properly evaluate the number of sources present, and their ability to localize those sources, as you add more and more sources,” Francl adds. “Humans seem to be restricted to locating roughly three sources at a time, and we noticed a very similar pattern of activity when we did the same test on the model.”

The researchers were able to investigate what occurs when their model learns to localize in various sorts of unnatural situations since they trained their models in a simulated environment. One set of models was trained in a virtual environment with no echoes, while the other was taught in a world where only one sound was heard at a time. In a third, instead of naturally occurring noises, the models were only subjected to sounds with specific frequency ranges.

When the models taught in these artificial settings were put through the same battery of behavioral tests, they diverged from human behavior, and the manner in which they failed differed depending on the sort of environment in which they were trained. According to the researchers, these findings reinforce the concept that the human brain’s localization skills are tailored to the surroundings in which humans originated.

This type of modeling is now being applied to other aspects of audition, such as pitch perception and speech recognition, and the researchers believe it could be used to understand other cognitive phenomena, such as the limits on what a person can pay attention to or remember, according to McDermott.

The National Science Foundation and the National Institute on Deafness and Other Communication Disorders supported the study.

The Energy Problem AI Cannot Solve for Itself

Efficiency is not the answer. It never was. The question was always about continuity.

14 May 2026

Science

Clean Energy Pledges Meet Their Hardest Test Inside the Data Center

Every time a large language model answers a question, something burns. Not metaphorically.

13 May 2026

Energy

The Circuit Nobody Built Yet: Neutrinos, Jobs, and the Communities Waiting for Both

The science is moving fast. The question nobody's asking is what it means for the rest of us.

5 May 2026