Skip to content

Creating Tests

A Test is a single scenario the AI tester runs against a channel. You describe, in plain language, what the tester should do and what counts as a pass — for example “book an appointment for Saturday” and “the agent must never give medical advice” — and Testzilla drives a real conversation against your agent and grades the result.

Tests sit at the third layer of the hierarchy: Project -> Channel -> Test -> Run -> Result. Every test belongs to a channel, and the channel’s connection type decides how the test reaches your agent (a phone call, a chat widget, a WebRTC voice session, or the LLM directly).

  1. Open a channel

    In a project, open the channel you want to test against and click + Test (or New Test). The test will run against that channel’s connection type. (Or describe the scenario to Tessie and let it write the test for you.)

  2. What should the tester do?

    In plain language, describe what the tester should do — this is the scenario that drives the tester’s side of the conversation, turn by turn. Be specific about the goal and any constraints, for example:

    “Caller asks to book a haircut for Saturday afternoon. The agent must confirm the day and time, and must not ask for a card number.”

    In a hurry? Click Use a template to drop in a starter script you can edit. (The field’s technical name is Caller Script.)

  3. What counts as passing?

    State what makes the run a pass. The grader checks the finished transcript against these criteria and returns a verdict, a score, and an analysis. Keep criteria concrete and checkable (“confirms a date and time”, “never asks for payment details”) rather than vague. (Technical name: Pass Criteria.)

  4. Advanced options (optional)

    The common path stops at the two questions above. Expand Advanced options to tune the rest:

    • Who speaks first? — whether the tester or the agent opens the conversation (Conversation Starter).
    • Max call length — a cap so a test cannot run forever (Max Duration (seconds)).
    • Prompt tokens — use {{date}}, {{phone_number}}, {{channel_prompt}}, {{project_prompt}} and the others to inject live values or pull in shared instructions from the channel and project instead of duplicating them.
  5. Save

    The test appears under its channel, ready to run.

Pick the test and click Run. Choose how many iterations to run, and optionally schedule it for later; track in-progress runs on the Queue page.

When a run finishes, open the result for:

  • a verdict bannerPASSED, FAILED, or UNCERTAIN — at the top of the page,
  • a transcript of the conversation,
  • a score,
  • a two-column analysis of why it passed or failed, and
  • the data collected during the run.

When a run fails, the result also shows a Suggested fix callout directly under the verdict banner — the single most actionable next step, pulled out of the analysis so you don’t have to dig for it.

Every section has a Copy button. Web Chat runs show a richer result with an enriched per-turn transcript and a screenshot gallery.

The grader returns one of three verdicts, shown as a badge:

  • PASSED — the transcript met your pass criteria.
  • FAILED — it did not.
  • UNCERTAIN — the grader could not decide confidently either way.

For scoring, PASSED counts as 100 and FAILED as 0; UNCERTAIN is treated as not passing.

To run the same set of tests against several channels at once — and benchmark how each connection type holds up — put the tests in a Folder channel and use the suite runner. Each run is attributed to the channel it actually ran against, so the comparison stays meaningful.

  • Generate a Report to analyse a channel’s or project’s results and get proposed fixes.
  • Automate runs from your own tooling: Integrate via API for CI, or Integrate via MCP to drive Testzilla from an AI agent.