When will voice recognition for multiple voices be available?
November 9, 2010

Voice recognition for multiple voices is a very popular feature that many users are looking for. It is currently being worked on by Nuance, the creator of Dragon NaturallySpeaking Speech Recognition Software. It proves to be a complicated idea to realize through software though. Voice Recognition currently works by training the software to recognize specific speech patterns and voices. You actually have to spend about 20-30 minutes to fully train the software to accurately recognize a user’s voice when the software is used for the first time. The software is not sophisticated enough to differentiate and separate different voices when there are multiple people talking at once. It’s likely to be quite some time before multiple voice speech recognition will be realized due to factors such as difficulties in programming the software and the availability of affordable technology to handle such functionality.

You can try think of it as how an actual person would be in a similar situation. Even we have trouble picking out what everyone says when there are several people all talking at once, and people are considered experts at speech recognition by the age 2-3. Imagine how difficult it would be to create a machine that could sift through all that information. From personal experience, it is actually very tedious and difficult to make a program that can do basic mathematical computations (such as FOILing) that humans can do in their head in seconds.

With the current technology available, even if voice recognition software could recognize multiple voices, it would be very tedious and time consuming waiting for computers to go through all the information. It would be a very memory intense process and would require enormous amounts of ram and processing power. At least in the present the required specs to run these kinds of processes won’t be cheap. You would need top of the line computer hardware that many people will either not have the budget for, or will not be able to justify the purchase of such expensive hardware.