Creating Tests

A Test is a single scenario the AI tester runs against a channel. You describe, in plain language, what the tester should do and what counts as a pass — for example “book an appointment for Saturday” and “the agent must never give medical advice” — and Testzilla drives a real conversation against your agent and grades the result.

Tests sit at the third layer of the hierarchy: Project -> Channel -> Test -> Run -> Result. Every test belongs to a channel, and the channel’s connection type decides how the test reaches your agent (a phone call, a chat widget, a WebRTC voice session, or the LLM directly).

Create a test

Open a channel

In a project, open the channel you want to test against and click + Test (or New Test). The test will run against that channel’s connection type. (Or describe the scenario to TZ Console and let it write the test for you.)
What should the tester do?

In plain language, describe what the tester should do — this is the scenario that drives the tester’s side of the conversation, turn by turn. Be specific about the goal and any constraints, for example:

“Caller asks to book a haircut for Saturday afternoon. The agent must confirm the day and time, and must not ask for a card number.”

In a hurry? Click Use a template to drop in a starter script you can edit. (The field’s technical name is Caller Script.)
What counts as passing?

State what makes the run a pass. The grader checks the finished transcript against these criteria and returns a verdict, a score, and an analysis. Keep criteria concrete and checkable (“confirms a date and time”, “never asks for payment details”) rather than vague. (Technical name: Pass Criteria.)
Advanced options (optional)

The common path stops at the two questions above. Expand Advanced options to tune the rest:
- Who speaks first? — whether the tester or the agent opens the conversation (Conversation Starter).
- Max call length — a cap so a test cannot run forever (Max Duration (seconds)).
- Prompt tokens — use {{date}}, {{phone_number}}, {{channel_prompt}}, {{project_prompt}} and the others to inject live values or pull in shared instructions from the channel and project instead of duplicating them.
Save

The test appears under its channel, ready to run.

Run a test and read the result

Pick the test and click Run. Choose how many iterations to run, and optionally schedule it for later; track in-progress runs on the Queue page.

When a run finishes, open the result for:

a verdict banner — PASSED, FAILED, or UNCERTAIN — at the top of the page,
a transcript of the conversation,
a score,
a two-column analysis of why it passed or failed, and
the data collected during the run.

When a run fails, the result also shows a Suggested fix callout directly under the verdict banner — the single most actionable next step, pulled out of the analysis so you don’t have to dig for it.

Every section has a Copy button. Web Chat runs show a richer result with an enriched per-turn transcript and a screenshot gallery.

Verdicts: PASSED, FAILED, UNCERTAIN

The grader returns one of three verdicts, shown as a badge:

PASSED — the transcript met your pass criteria.
FAILED — it did not.
UNCERTAIN — the grader could not decide confidently either way.

For scoring, PASSED counts as 100 and FAILED as 0; UNCERTAIN is treated as not passing.

Reuse tests across channels

To run the same set of tests against several channels at once — and benchmark how each connection type holds up — put the tests in a Folder channel and use the suite runner. Each run is attributed to the channel it actually ran against, so the comparison stays meaningful.

Next steps

Generate a Report to analyse a channel’s or project’s results and get proposed fixes.
Automate runs from your own tooling: Integrate via API for CI, or Integrate via MCP to drive Testzilla from an AI agent.