Enter Password

Customer Agent Tester

Designed a testing and refinement system to help customers build trust and confidence to deploy HubSpot’s Flagship AI agent, the Breeze Customer Agent.

*Full case study available upon request

HubSpot
April 2025 - August 2025

What's HubSpot and the Breeze Customer Agent?

HubSpot is a B2B SaaS platform that provides CRM, marketing, sales, and customer service software within a shared data infrastructure. Its products are organized into “Hubs” (Marketing, Sales, Service, etc.) that operate on a common CRM and customer data model.

My work took place within Service Hub (and later, the AI group) on the Breeze Customer Agent, HubSpot’s flagship AI-powered customer support agent. The Breeze Customer Agent uses CRM data and connected knowledge sources to generate responses to customer questions, qualify inquiries, and route conversations to human agents when necessary. It is available to Professional and Enterprise customers, operates on a usage-based credit system, and can be deployed across multiple channels, like live chat, e-mail, and calling.

Who uses the Breeze Customer Agent?

There are three groups of users the Breeze Customer Agent is targeted at. While it originally started as a customer support agent, the use cases have grown to cover marketing and sales.

Customer service

The Breeze Customer Agent is designed for customer service teams managing high volumes of support inquiries who want to scale their support without increasing headcount. It’s especially valuable for teams looking to provide reliable 24/7 coverage, and reduce ticket backlogs. By automatically handling routine questions and escalating more complex issues to human agents when needed, it helps teams operate more efficiently while maintaining a high-quality support experience.

Marketing

The Breeze Customer Agent is also designed for marketing teams looking to engage website visitors in real time and convert interest into action. It helps answer common questions about events, pricing, webinars, and subscriptions, ensuring visitors get the information they need instantly. By qualifying prospects and guiding them to relevant content, the agent supports turning anonymous traffic into high-quality leads while creating a more responsive and personalized marketing experience.

Sales

The Breeze Customer Agent is also built for sales teams that need to respond quickly to prospect questions about products, pricing, and trials. It engages visitors in real time, helping qualify leads, book meetings, and accelerate the buying process. By leveraging CRM data to tailor conversations, the agent ensures interactions are relevant and personalized, allowing sales teams to move prospects forward more efficiently.

The problem

A fast evolving product without a structured feedback system

When the Breeze Customer Agent first launched, the product was evolving rapidly and many foundational capabilities were still being defined. One key missing piece was a structured system for users to evaluate, refine, or build confidence in the agent’s behavior before deploying it to real customers.

The only feature to somewhat test the customer agent was a simple preview experience (screenshot below), where users could ask individual questions and view responses, but the interaction ended there. If the user wanted to make any improvements, they would have to leave the preview experience and figure out where in the Customer Agent App to make the improvement. Should they go update a source? Add or remove a handoff trigger? Who knows!

This created a fragmented and incomplete feedback loop:
• Responses could be observed, but not meaningfully improved
• Changes could not be re-tested or validated within the same workflow
• Further more, new capabilities could not be explored in a consistent way

At a systems level, the customer agent lacked a unified experience for iterating on agent behavior. Evaluation, refinement, and feature exploration were disconnected, making it difficult for users to continuously improve the system or adapt to new capabilities.

Previous preview experience

The opportunity

Unlocking deployment through a scalable feedback system

In early research, we found that most teams took a cautious, manual approach to rollout by testing common questions themselves, deploying the agent to a small set of (sometimes hidden) pages, testing after hours, and then gradually expanding access over time. While this reduced risk, it was slow, inconsistent, and often led to stalled or incomplete deployments.

At its core, this revealed a system gap: users had no reliable way to evaluate, refine, and validate agent behavior before going live. Without a structured feedback loop:

Testing was fragmented
Improvements couldn’t be easily verified

As the Customer Agent rapidly evolved with new features, this problem compounded. Users weren’t just unsure of response quality, but whether the entire system (including behaviors like handoffs and actions) was configured correctly.

For HubSpot, this created a clear bottleneck: users were getting stuck between setup and deployment, limiting overall adoption and downstream AI credit usage.

How might we…

How might we create a scalable system that enables users to evaluate, refine, and validate their Customer Agent so they can confidently deploy and adopt new capabilities as the product evolves?

User goals

Validate that the agent is behaving correctly before going live
Refine responses and system behavior in a fast, iterative workflow
Understand how and why the agent produces certain outputs
Ensure the full system (responses, handoffs, actions) is properly configured
Move from cautious testing to confident, broader deployment

Business goals

Remove friction between setup and deployment to increase adoption
Accelerate time to value by enabling faster, more reliable iteration
Support ongoing feature expansion without slowing user understanding
Drive AI credit consumption through increased and sustained usage across Professional and Enterprise customers

The solution: Customer Agent Tester

A unified system for testing and refinement

To address this gap, we designed and built the Customer Agent Tester, a scalable system that enables users to evaluate, refine, and validate agent behavior within a single, continuous workflow. At a high level, the tester is structured around two core areas:

Conversation panel: Made to look like a chat widget, this allows users to interact with their customer agent in real time
Insights panel: Users can view insights on each agent response to further understand, evaluate, and refine.

Modular insight layer

To support evaluation and refinement, we introduced a system of message insights cards within the insight panel. Instead of designing separate interfaces for each insight, I created a shared pattern. This ensures consistency for users while allowing the system to scale as new features are introduced. These cards act as a standardized interface for:

Explaining why a response was generated
Providing controls to refine outputs
Surfacing system-level signals and capabilities
Validating any improvements made

Establishing a unified feedback loop

The tester was designed around a continuous feedback loop that enables users to evaluate, refine, and validate agent behavior in real time. However, an important part of this loop already existed in the platform called Knowledge Gaps. Knowledge gaps occur when the agent cannot answer a customer's question or says, "I don't know." Unfortunately, knowledge gaps lived outside of the previous preview experience and knowledge gaps could only occur after your agent was deployed. To create a continuous system, I integrated knowledge gaps directly into the tester, transforming it into a key part of the testing loop.

When the agent fails to answer, the gap is surfaced immediately in context
Users can take action without leaving the workflow
Improvements can be validated instantly through re-testing

Knowledge gap experience outside the tester

Creating a short answer to resolve a knowledge gap

Knowledge gap message insight card flow

In context

Designing for scalability and evolution

Because the Customer Agent was rapidly evolving, the system needed to support capabilities that didn’t yet exist. By structuring the experience around:

A stable interaction model (conversation + insight panels)
A modular interface layer (insight cards)

…the tester can incorporate new features without requiring redesign.

This allows the system to grow in capability while maintaining a consistent and predictable user experience.

Future iteration: Adding agent reasoning

Future iteration: Manual flagging

Impact & Outcomes

The Customer Agent Tester was designed to address a key barrier: users’ lack of confidence in their Customer Agent prior to rollout. While overall adoption of the Customer Agent is influenced by many factors, the engagement data from the Tester provides clear signals that users were iteratively evaluating and improving their agents.

Within the first 3 months of launch (July - September 2025):

The “Improve response” CTA was clicked 6,717 times, showing frequent refinement activity
2,300 short answers were created, transforming knowledge gaps into reusable responses
240 handoff triggers were executed, enabling users to test escalation flows in context

These metrics demonstrate that users actively engaged with the system, iteratively testing and refining responses, which aligns with the user goal of building confidence pre-deployment. Qualitative feedback further reinforced this impact, with users noting that the insights and inline refinement tools helped them understand how their agent generated responses and how to improve them before going live.

By providing transparency, control, and immediate refinement capabilities, the testing experience helped teams make informed deployment decisions and reduced the friction around initial agent rollout, effectively setting the foundation for broader adoption in future releases.

My role

As Senior Product Designer I on the newly formed Customer Agent Coaching team, I led the design of the Customer Agent Tester. I collaborated with customers, engineering, product, and stakeholders to understand gaps, define the problem, and design a solution that let teams evaluate, refine, and safely deploy their AI agent. My work spanned research synthesis, interaction design, and adapting components across two design systems, all under tight deadlines and organizational transition.

Team Composition:

Senior Product Designer I (me)
Senior Product Manager
Front-End Engineering Lead
Front-End Senior Software Engineer II
Front-End Senior Software Engineer I ×2
Back-End Tech Lead
Back-End Senior Software Engineer II ×2