- The developed speaker uses some self-deploying microphones to identify different sounds in the room and silence them on command.
- Researchers use neural networks to isolate each person’s voice and track their location.
This newly invented speaker organizes the microphone system to self-deploy by creating “talk zones” in the room. Thus, it can track and identify different sounds even while moving.
The researchers behind the invention say this pinpoint localization allows them to not only separate simultaneous conversations, but also mute noisy areas or even people they find annoying for applications such as video conferencing in meetings.
This unusual speaker was published in the journal Nature Communications; Self-deploying microphones are defined as thimble-sized robots that communicate with each other, move to different points on their own on tiny wheels, and return to the charging station when necessary.
“For the first time, we can track the location of multiple people talking in a room and separate their speech using what we call robotic ‘acoustic swarming,'” study co-lead author Malek Itani said in a statement at the Paul G. Allen School of Computer Science and Engineering. said.
Researchers say the prototype bots use a technique similar to high-frequency echolocation to navigate their environment. The neural network that processes the data can make more precise calculations by spreading the microphones as far as possible. However, for now, robots can only be positioned in 2D space, so their circulation is limited to desktops.
“You can have four people engage in a double conversation, isolate any of the four voices, and locate each of the voices in the room,” co-lead author Tuochao Chen of the Allen School said in a statement. said.
Chen’s claims are confirmed by the results of real-world experiments.
Researchers tested swarms of robots in places such as offices and kitchens while three to five people talked; the system had no prior knowledge of locations or sounds. Despite these obstacles, the device was still able to locate sounds within 5 meters of each other 90% of the time. On average, the system takes 1.82 seconds to process three seconds of audio; This can make video conferences a bit cumbersome.
In the future, researchers want to be able to apply these silencing and separation techniques to the entire room in the physical space, in real time, using noise-cancelling headphones and microphones.
Compiled by: Damla Şayan