How Replit identifies patterns across millions of AI sessions

Replit is on a mission to make software creation accessible to everyone. They enable users to create beautiful, modern web applications at the speed of thought with the power of Replit's AI Agent.

As their AI agent grew in popularity, Replit needed a way to understand AI behavior, not just at the individual level, but across its entire user base. With Braintrust, CTO Luis Héctor Chávez and his team moved from manual, multi-tool debugging to a unified observability layer that surfaces patterns at scale.

The challenge: Debugging AI across fragmented tools

Before adopting Braintrust, Replit's approach to understanding AI issues was largely manual. When someone reported a problem, engineers needed to collect session identifiers, then hunt through conversation logs across multiple tools.

Before we adopted Braintrust, we had a semi in-house built solution that was fairly manual. A user reported a problem, we needed to go and ask them for their station identification, their user identification, so that we could go find the exact conversation they were having and then inspect the information across multiple panes of glass.

Engineers spent significant amounts of time piecing together what happened across observability tools, tracing tools, logs, conversation views, and structured tool call data. Because of these disparate data sources, knowing what to search for was often the hardest part of debugging an issue.

Building a single pane of glass

Before Braintrust

User reports issue

Collect session IDs manually

Search across multiple tools

Piece together context

Fix one issue at a time

Slow, fragmented debugging

Before Braintrust

User reports issue

Collect session IDs manually

Search across multiple tools

Piece together context

Fix one issue at a time

Slow, fragmented debugging

After Braintrust

Traces stream in real-time

Patterns surface automatically

Investigate in a single view

Fix families of problems at once

Unified observability at scale

Replit worked with Braintrust to build a unified observability layer that consolidates all the signals engineers need into one place. Rather than switching between tools, the team can now see the full picture of a session, including all surrounding signals and metadata, in a single view.

Now, with Braintrust, we've been able to build a single unified solution that allows us to not just look at the session that the user is seeing on Replit, but also all sorts of other signals and pieces of information that are useful for developers to identify solutions.

The system also reduces noise. Instead of exposing engineers to everything, it highlights the information most relevant to the problem at hand, and even suggests automated assistance for debugging.

From individual traces to horizontal patterns

The biggest shift in Replit's approach was moving from investigating one session at a time to looking across multiple sessions horizontally. This allowed the team to identify patterns that would have been invisible at the individual level.

In one instance, the team discovered that a significant number of users were struggling to build authentication for their applications. This issue emerged when using Braintrust to conduct an analysis of where exactly users were dropping off in their onboarding journey. Before Braintrust, this was not something a single bug report would have surfaced.

Another pattern involved degenerating responses from a specific LLM. By looking across sessions horizontally, the team spotted the anomaly, reported it, and resolved it. By working from patterns rather than focusing on individual traces, Replit can more effectively prioritize which issues to focus on.

What happens if there are multiple people experiencing the same pain but not expressing it? Having this ability to go beyond a single session and see patterns across the board really helps us fix a whole family of problems.

Real-time evaluation and faster response

Traces sent in real-time

Patterns aggregated

Evals flag regressions

Team remediates

Replit also uses Braintrust to augment traces with evaluation signals while a user is still building their application. Before the user has even finished working, traces are being sent to Braintrust and aggregated into patterns.

Previously, Luis and team would have to wait to see these traces. This near-real-time visibility has reduced both mean time to detection and mean time to remediation, because the feedback loop from identifying problems to understanding patterns is so much faster.

One of the things that really worked for us is that the Braintrust team helped us optimize the feedback loop so that instead of having to wait five or 10 minutes to see the traces, the moment the user is experiencing pain, we can see the trace immediately.

This feedback loop is critical for Luis's team. It's how they go from "I think there's something wrong" to actually fixing the problem. Braintrust also helps them understand if the solution might impact other users. Replit runs a series of custom evals that can identify when a given change might have unexpected impacts beyond a single user, so they can avoid pushing changes that have broader effects.

Empowering teams beyond engineering

The improved tooling doesn't just benefit engineers. Previously, whenever a user reported a problem to the support team, they would have to wait for the engineering team to identify what was happening during the user's journey before they could offer a solution.

The support team now has the ability to investigate issues in greater depth, moving beyond the surface-level problem a single person reported to gathering dozens or hundreds of related examples.

Now we have much better tools that allow people in the support team to go in depth and not just surface a single instance of a problem, but allow them to gather tens or hundreds of examples so that the engineers can prioritize effectively and figure out a more general solution.

Key takeaways

Look for horizontal patterns, not just vertical traces. Individual bug reports reveal individual issues. Looking across sessions at scale reveals systemic problems and families of related issues.
Reduce time to insight with a single pane of glass. Consolidating observability, tracing, and conversation data in one place eliminates context-switching between tools.
Augment traces with evaluation signals in real time. Adding feedback loop data before traces are even published enables near-instant detection and faster remediation.
Empower non-engineering teams with observability tools. When support teams can investigate deeply on their own, engineers receive better prioritized, more complete information to act on.
Combine automation with human insight. Automations find known patterns at scale, but humans are still essential for identifying novel or unexpected relationships that automation cannot yet detect.

Thank you to Luis Héctor Chávez for sharing Replit's story.

The challenge: Debugging AI across fragmented tools

Building a single pane of glass

Before Braintrust

Before Braintrust

After Braintrust

From individual traces to horizontal patterns

Real-time evaluation and faster response

Empowering teams beyond engineering

Key takeaways

Surface patterns across your AI sessions

Read more customer stories

How Loom auto-generates video titles

How Notion evaluates AI at scale across 70 engineers

Trace everything