Smart Endpointing

Endpointing refers to the process of automatically detecting when a user has finished speaking during a conversation with a Flyflow agent. It helps determine the appropriate time to stop recording the user's speech and allow the agent to respond. Flyflow provides a configurable endpointing setting to optimize the user experience and ensure smooth conversational flow.

Smart Endpointing

When the user stops speaking, the agent needs to decide whether they're pausing, waiting to finish a thought, or if the user has finished speaking and it's the agents turn to talk. Most transcription models just transcribe, they don't assign a probability that the user has actually stopped speaking. Below is documentation on how to set the transcription threshold.

To fix this problem, we introduce a new model, the flyflow smart endpointing model. Essentially what this model does is it assigns a probability to whether the user has stopped speaking in real time before it's the agent's turn to speak.

Example 1

Assistant: List your favorite foods in order
User: I like apples
Probability 21%

Here the assistant lists a lower probability that the user has finished speaking based on the fact that the assistant is asking for a list, but the user has only said one item. Now if another transcription comes in, example 2:

Assistant: List your favorite foods in order
User: I like apples
User: oranges, and bananas
Probability 86%

Here we can see that the probability that the user has finished speaking increases because the list has finished.

What this example shows is our ability to on the fly assess whether the user has stopped speaking and whether it's the agents turn to talk.

The API parameter for setting this up is called the smart_endpointing_threshold. By default this value is set to 70, so if the probability the user has stopped speaking the agent responds immediately. You can set this value anywhere between 0 and 100, where 1 would force the agent to always respond immediately, and 100 would always introduce some level of backoff. When the probability from our smart endpoint model returned is less than the threshold, we backoff and let the user finish their thought. The backoff is determined by how much below the threshold the probability assigned is.

Example Usage

smart_endpointing_threshold = 20

client.upsert_agent(
  name='My Agent',
    system_prompt='You are a helpful assistant.',
    smart_endpointing_threshold=smart_endpointing_threshold
)

Endpointing Threshold

When creating or updating an agent using the Flyflow API, you can specify the endpointing latency configuration through the endpointing parameter. The endpointing parameter accepts an integer value representing the number of milliseconds the agent should wait before finalizing the user's speech. This is in addition to smart endpointing.

The endpointing value determines the duration of silence (in milliseconds) required to consider the user's speech as completed. If the user remains silent for the specified duration, the agent will assume the user has finished speaking and will start processing the response.

Example Usage

endpointing_ms = 200

flyflow.upsert_agent(
    name='My Agent',
    system_prompt='You are a helpful assistant.',
    endpointing=endpointing_ms
)

In the above example, the endpointing parameter is set to 200, indicating that the agent should wait for 200 milliseconds (0.2 seconds) of silence before finalizing the user's speech.

Default Endpointing Value

The default value for endpointing in Flyflow is 100 milliseconds (0.1 seconds). This default value provides a responsive endpointing behavior suitable for most conversational scenarios.

Recommended Endpointing Range

It is recommended to set the endpointing value between 100 and 500 milliseconds (0.1 to 0.5 seconds). This range strikes a balance between responsiveness and allowing natural pauses in user speech.

  • Lower values (around 100-200ms) are suitable for fast-paced conversations where quick responses are desired, such as customer support scenarios.
  • Higher values (around 300-500ms) are appropriate for conversations that require more thoughtful responses or involve longer pauses, such as therapy or counseling sessions.

Endpointing and Latency

It's important to note that the endpointing value directly impacts the latency of the agent's response. A longer endpointing duration will increase the overall latency, as the agent will wait for the specified silence duration before processing the user's speech.

Consider the following guidelines when choosing the endpointing value based on your use case:

  • For customer support or similar scenarios where quick responses are crucial, a shorter endpointing value (100-200ms) is recommended to minimize latency and provide a snappy user experience.
  • For therapy, counseling, or other conversations that involve more reflective and longer user responses, a longer endpointing value (300-500ms) can be used to accommodate natural pauses and allow users to express themselves fully.

Striking the right balance between endpointing and latency is key to creating a smooth and efficient conversational flow tailored to your specific requirements.

Conclusion

The endpointing parameter in Flyflow allows you to configure the duration of silence required for the agent to finalize the user's speech. By adjusting the endpointing value within the recommended range of 100 to 500 milliseconds, you can optimize the conversational flow and latency based on your use case.

Whether you prioritize quick responses for customer support or allow longer pauses for therapy sessions, Flyflow's endpointing configuration provides the flexibility to adapt the agent's behavior to your specific needs.