Wednesday, September 25, 2024

OpenAI Release Advanced Voice features



OpenAI has finally rolled out its highly anticipated Advanced Voice feature to all ChatGPT Plus and Team users. This feature, which was first showcased during the launch of GPT-4o months ago, has been met with much excitement and some controversy.

Advanced Voice utilizes the multimodal capabilities of the GPT-4o model, allowing for a natural, free-flowing conversation that supports interruptions. It’s a significant upgrade from the standard voice chat available to free ChatGPT users, offering a more immersive and interactive experience.

OpenAI has been working diligently on Advanced Voice, ensuring its safety and improving its functionality. In addition to the ability to support interruptions, the feature now includes custom instructions, memory, five new voices, and improved accents. It can even say "sorry I'm late" in over 50 languages.

While the rollout has been delayed due to safety concerns and the controversy surrounding the "Sky" voice – which bears a striking resemblance to Scarlett Johansson's voice – OpenAI has addressed these issues and is confident in its final product.

Advanced Voice might sound similar to Google's Gemini Live, but there's a key distinction. Gemini Live relies on TTS/STT (text-to-speech) engines to translate responses from an LLM back into speech, whereas ChatGPT Advanced Voice handles audio input/output directly. While Gemini Live also supports interruptions, it lacks the truly multimodal experience that ChatGPT Advanced Voice offers.

Despite the initial excitement surrounding Advanced Voice, early testing has revealed that some of the showcased multimodal features are currently missing. During the initial demo, OpenAI demonstrated Advanced Voice's ability to sing, identify moods and emotions, detect various sounds, and perform accents. However, the current version of Advanced Voice cannot identify speech, and camera input is not yet supported.

It is possible that OpenAI has removed some of these features to prevent potentially embarrassing conversations with ChatGPT. Nevertheless, the introduction of Advanced Voice marks a significant step forward in conversational AI, offering a more engaging and natural experience for users.

The availability of Advanced Voice to all ChatGPT Plus and Team users signifies OpenAI's commitment to enhancing the platform and providing users with the most cutting-edge AI tools. With its ability to understand and respond to audio input, Advanced Voice has the potential to revolutionize how we interact with AI.

While it's still early days, the future of Advanced Voice is bright, and the possibilities it presents are truly exciting. As OpenAI continues to refine and expand its capabilities, we can expect to see even more innovative features and improvements in the coming months and years. It will be interesting to see how Advanced Voice evolves and how it shapes the future of conversational AI.

0 comments:

Post a Comment