Information about RealtimeAvatarVoice
Hey there, newbie to Normcore here, really excited to get going with this multiplayer package I've heard so much about.
I'm making a multiplayer VR game, and I'd like to use the player's voice data to interact with OpenAI's realtime API for certain interactions. Looking at the RealtimeAvatarVoice page (Realtime API -> XR -> Voice Chat), no information is given about how to use the
voiceData
class member I can see in Visual Studio when searching through the components of RealtimeAvatarVoice
.
I see that voiceData
is a float[]
, but any other info about it is a guess: the bit-depth, sample length, etc. I'm working towards converting it into something that OpenAI can parse but I can't do that if I don't have more information about it.5 Replies
Hi @Jordan. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:
----------------------
Thank you for your question about Normcore's RealtimeAvatarVoice component. I understand you're looking for more detailed information about the
voiceData
member to integrate it with OpenAI's API. However, I need to clarify that the information you're referring to about voiceData
is not present in the knowledge sources provided to me.
The Realtime > Xr > Voice Chat documentation does mention a voiceVolume
property, which is a read-only value between 0-1 that reflects the audio level the last time it was sampled for playback. This property can be useful for animating the mouth of an avatar to match a player's speech.
However, the documentation doesn't mention a voiceData
member or provide details about raw audio data access. It's possible that the voiceData
member you're seeing in Visual Studio is part of an internal implementation that isn't documented in the public API.
For accessing raw audio data, the documentation does mention an AudioPreprocessor that's included with RealtimeAvatarVoice. This preprocessor automatically sets the gain for the microphone and filters out background noise, reverb, and echo cancellation. However, it doesn't provide details on how to access the raw audio data.
If you need access to raw audio data for integration with OpenAI's API, you might want to consider using Normcore's Media Streams API. According to the Room > Room > Media streams documentation, Normcore provides high-quality, low-latency audio streams via CreateAudioInputStream()
. This API returns an AudioInputStream
that can be used to send audio data to all clients.
Given the limited information available in the provided knowledge sources about directly accessing raw voice data from RealtimeAvatarVoice, I would recommend reaching out to Normcore's support or community forums for more detailed information about accessing raw audio data or using the voiceData
member you've discovered. They might be able to provide more specific guidance or confirm if there's a recommended way to access the raw audio data for your use case.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------<@817172607438815262> Kapa.ai is still learning and improving, please let me know how I did by reacting below.
great question. It’s mono and I believe always 48khz but I can’t remember if we call the voiceData function before or after resampling if it’s needed. @Scott will you take a look at this?
Yeah, just double checked, and it will be 48khz mono PCM samples, each sample being a 32-bit float in the range [-1.0, 1.0], when using the native oculus microphone.
The
voiceData
event will be triggered only when the buffer array is full with the next 48khz samples. The event may be invoked multiple times within a single frame, though this generally is unlikely unless the game is hitching (freezing, loading, etc.)thanks for the info!