Lowering Latency with Flyflow

Latency is a critical factor in delivering a smooth and responsive conversational experience. Flyflow provides several features and optimizations to help you achieve the lowest possible latency in your voice-based interactions. By leveraging Flyflow's custom language model, filler words, endpointing configuration, and voice optimization settings, you can significantly reduce the response time and create a more natural and engaging conversation flow.

Flyflow-Voice: Custom Language Model

Flyflow offers a custom language model called flyflow-voice specifically designed for low-latency voice interactions. This model is optimized to generate fast and accurate responses, enabling your agents to respond quickly to user queries.

To use the flyflow-voice model, specify it as the llm_model property when creating or updating an agent through the Flyflow API.

Example

{
  "name": "My Agent",
  "system_prompt": "You are a helpful assistant.",
  "llm_model": "flyflow-voice"
}

By utilizing the flyflow-voice model, you can expect significantly lower latency compared to generic language models. Flyflow's custom model is trained on a vast amount of conversational data and fine-tuned to generate relevant and concise responses, reducing the overall processing time.

Filler Words: Bridging the Latency Gap

In addition to the custom language model, Flyflow provides the filler words feature to further minimize the perceived latency. Filler words are short phrases or sounds that are played at the beginning of an agent's response, giving the impression of an immediate acknowledgment.

By enabling filler words, you can bridge the latency gap between the user's query and the agent's full response. The filler word is played instantly while the language model processes the complete response in the background. This creates a seamless experience for the user, as they receive an immediate reaction from the agent.

To enable filler words for an agent, set the filler_words property to true when creating or updating an agent through the Flyflow API.

Example

{
  "name": "My Agent",
  "system_prompt": "You are a helpful assistant.",
  "llm_model": "flyflow-voice",
  "filler_words": true
}

Flyflow's filler word classification model intelligently selects the most appropriate filler word based on the context and content of the agent's response. This ensures that the filler word sounds natural and fits seamlessly into the conversation.

Endpointing: Optimizing Speech Recognition

Endpointing refers to the process of automatically detecting when a user has finished speaking during a conversation with a Flyflow agent. It helps determine the appropriate time to stop recording the user's speech and allow the agent to respond. Flyflow provides a configurable endpointing setting to optimize the user experience and ensure smooth conversational flow.

The endpointing parameter accepts an integer value representing the number of milliseconds the agent should wait before finalizing the user's speech. The default value for endpointing in Flyflow is 100 milliseconds (0.1 seconds).

It is recommended to set the endpointing value between 100 and 500 milliseconds (0.1 to 0.5 seconds). Lower values (around 100-200ms) are suitable for fast-paced conversations where quick responses are desired, such as customer support scenarios. Higher values (around 300-500ms) are appropriate for conversations that require more thoughtful responses or involve longer pauses, such as therapy or counseling sessions.

Keep in mind that the endpointing value directly impacts the latency of the agent's response. A longer endpointing duration will increase the overall latency, as the agent will wait for the specified silence duration before processing the user's speech.

Example

endpointing_ms = 200

flyflow.create_agent(
    name='My Agent',
    system_prompt='You are a helpful assistant.',
    endpointing=endpointing_ms
)

Voice Optimization: Balancing Speed and Quality

Flyflow provides a voice_optimization setting that allows you to optimize the trade-off between response speed and audio quality. The voice_optimization parameter accepts an integer value ranging from 0 to 4, with 0 being the highest quality but slower response, and 4 being the fastest response but with lower audio quality.

By adjusting the voice_optimization value, you can prioritize either speed or quality based on your specific requirements. Lower values (0-2) prioritize audio quality, resulting in more natural-sounding responses but slightly increased latency. Higher values (3-4) prioritize response speed, delivering faster responses but with a slight reduction in audio quality.

Example

voice_optimization = 3

flyflow.create_agent(
    name='My Agent',
    system_prompt='You are a helpful assistant.',
    voice_optimization=voice_optimization
)

In the above example, the voice_optimization parameter is set to 3, indicating a balance favoring faster response speed over audio quality.

Lowest Possible Latency: 300 Milliseconds

By combining the flyflow-voice language model, filler words, optimized endpointing, and appropriate voice optimization settings, Flyflow aims to achieve the lowest possible latency of around 300 milliseconds. This means that users will experience near-instant responses from your agents, creating a highly responsive and engaging conversation.

The filler word is played immediately, typically within 100-200 milliseconds, while the flyflow-voice model generates the complete response in the background. By the time the filler word is finished, the full response is usually ready to be played, resulting in a smooth transition and minimizing any noticeable delay.

Conclusion

Lowering latency is essential for creating a natural and responsive conversational experience. Flyflow provides powerful tools and optimizations to help you achieve the lowest possible latency in your voice-based interactions.

By leveraging the flyflow-voice custom language model, enabling filler words, configuring appropriate endpointing settings, and optimizing voice quality based on your needs, you can significantly reduce the response time and create a more engaging conversation flow.

With Flyflow's latency optimizations, you can expect response times as low as 300 milliseconds, delivering a near-instant and seamless conversational experience to your users. By prioritizing low latency and balancing speed and quality, you can enhance user satisfaction, increase engagement, and create more natural and human-like interactions with your voice-based agents.