Featured image
OpenAI

Introducing Voice and Image Capabilities in ChatGPT: A New Level of Interaction

avatar

Sven

September 26th, 2023

~ 4 min read

OpenAI is taking the capabilities of ChatGPT to a whole new level by introducing voice and image capabilities. These new features provide a more intuitive interface, allowing users to engage in voice conversations with ChatGPT and share images for a more interactive experience.

Voice and image capabilities open up a world of possibilities for users. Imagine being able to have a live conversation with ChatGPT about interesting landmarks while traveling, or using images of your fridge and pantry to plan your dinner and even get step-by-step recipe instructions. You can even use this feature to help your child with their homework by snapping a photo of a math problem and receiving hints and guidance.

OpenAI is rolling out voice and image capabilities to Plus and Enterprise users over the next two weeks. Voice capabilities will be available on iOS and Android, and images will be supported on all platforms.

Engage in Conversations with ChatGPT using Voice

With the new voice capability, you can now have back-and-forth conversations with ChatGPT. Whether you're on the go and need to speak with your assistant, want a bedtime story for your family, or need to settle a dinner table debate, ChatGPT is ready to talk back.

To get started with voice, simply head to Settings > New Features on the mobile app and opt into voice conversations. Then, tap the headphone button on the home screen and choose from five different voices for your assistant.

The new voice capability is powered by a text-to-speech model that generates human-like audio from text and a short sample of speech. OpenAI collaborated with professional voice actors to create each voice and utilized their open-source speech recognition system, Whisper, to transcribe spoken words into text.

Share Images and Get Insights with ChatGPT

Another exciting feature is the ability to show ChatGPT one or more images. Whether you need help troubleshooting your grill, planning a meal based on the contents of your fridge, or analyzing complex graphs for work-related data, ChatGPT is ready to assist.

To get started with image capabilities, simply tap the photo button in the mobile app to capture or choose an image. You can discuss multiple images and even use the drawing tool to guide your assistant's attention to specific parts of the image.

Image understanding is powered by multimodal GPT-3.5 and GPT-4 models, which apply their language reasoning skills to a wide range of images, including photographs, screenshots, and documents containing both text and images.

Gradual Deployment for Safety and Refinement

OpenAI's goal is to build AGI (Artificial General Intelligence) that is safe and beneficial. As part of their approach, they are rolling out voice and image capabilities gradually. This allows for continuous improvements and risk mitigations while also preparing users for more advanced systems in the future.

Voice and image capabilities present new risks, such as impersonation or fraud. OpenAI addresses these risks by focusing the voice technology on a specific use case, voice chat, and collaborating with voice actors and other organizations like Spotify. For image capabilities, OpenAI has tested the model extensively with red teamers and alpha testers to identify risks and ensure responsible usage.

OpenAI is committed to transparency about the limitations of the model and encourages users to verify information for specialized topics. The model performs well with transcribing English text but may struggle with non-English languages, especially those with non-roman scripts.

Exciting Times Ahead for ChatGPT Users

Plus and Enterprise users will be the first to experience the voice and image capabilities in ChatGPT over the next two weeks. OpenAI plans to expand access to these features for other user groups, including developers, in the near future.

OpenAI's introduction of voice and image capabilities takes ChatGPT to new heights of interactivity and usefulness. Whether you're engaging in conversations on the go or sharing images for assistance, ChatGPT is ready to enhance your daily life.

Links: https://openai.com/blog/chatgpt-can-now-see-hear-and-speak