Problem
docs/evaluate/criteria.md is missing documentation for the rubric-based multi-turn evaluator, and the implicit requirements around Rubric.type and criterion-level rubrics aren't explained for any of the rubric-based criteria. Each gap causes a real, easy-to-hit failure in practice.
Gap 1: No section for rubric_based_multi_turn_trajectory_quality_v1
The metric exists upstream — registered in metric_evaluator_registry.py (PrebuiltMetrics.RUBRIC_BASED_MULTI_TURN_TRAJECTORY_QUALITY_V1) and implemented in rubric_based_multi_turn_trajectory_evaluator.py — but the corresponding section in criteria.md is missing. The other two rubric-based metrics (rubric_based_final_response_quality_v1, rubric_based_tool_use_quality_v1) have full sections with EvalConfig examples; this one doesn't.
User impact: someone trying to evaluate multi-turn trajectory quality with custom rubrics has no guidance, and falls back to copying the single-turn section, which uses subtly different semantics.
Gap 2: Rubric.type field semantics aren't documented
Every RubricBasedEvaluator subclass declares a RUBRIC_TYPE ClassVar that acts as a filter — only rubrics whose Rubric.type matches that value are picked up:
| Metric |
Required Rubric.type (from source) |
rubric_based_final_response_quality_v1 |
FINAL_RESPONSE_QUALITY |
rubric_based_tool_use_quality_v1 |
TOOL_USE_QUALITY |
rubric_based_multi_turn_trajectory_quality_v1 |
TRAJECTORY_QUALITY |
None of these values appear anywhere in criteria.md (or anywhere else in adk-docs based on a code search). The existing JSON examples in criteria.md don't set the type field either, so users following the examples and then attaching per-case rubrics via EvalCase.rubrics end up with type=None rubrics that the evaluator silently drops.
User impact: rubric scores come back empty / None with no clear error — only a Rubric ... not found in the rubrics provided to the metric. warning if the user happens to be reading log output closely.
Gap 3: criterion.rubrics is required, EvalCase.rubrics is additive — neither is stated
RubricBasedEvaluator.__init__ asserts that criterion.rubrics is non-empty (rubric_based_evaluator.py:332). create_effective_rubrics_list then unions criterion-level rubrics with per-case rubrics (filtered by type). The docs don't mention either fact:
- The current
criteria.md examples put all rubrics on the criterion, so the assert never fires for users who copy them verbatim.
- But users who try to attach rubrics on individual
EvalCase entries (a natural pattern when each case has case-specific rubrics) and leave criterion.rubrics = [] hit AssertionError: Rubrics are required. at init with no docs-level explanation of why.
- Similarly, users who put rubrics on both expect criterion rubrics to be overridden per case; they're actually unioned.
User impact: confusing assert at init, or unexpected "extra" rubrics being scored per case.
Proposed direction
A single docs PR that:
- Adds a
## rubric_based_multi_turn_trajectory_quality_v1 section to criteria.md, modeled on the existing rubric_based_final_response_quality_v1 section (with a TRAJECTORY_QUALITY example), and the corresponding row to the criteria table at the top of the page.
- Adds a small "Notes On Rubrics" subsection under each of the three rubric-based metric sections that states the required
Rubric.type and the criterion-vs-EvalCase rubric relationship.
- Adds the explicit
type field to the existing JSON examples in the two existing rubric-based metric sections so users have a complete template.
No code changes — docs/evaluate/criteria.md only.
Filing this as an issue first per the CONTRIBUTING guideline for "New Documentation". Happy to follow up with a draft PR for review if maintainers think the direction is right.
Problem
docs/evaluate/criteria.mdis missing documentation for the rubric-based multi-turn evaluator, and the implicit requirements aroundRubric.typeand criterion-level rubrics aren't explained for any of the rubric-based criteria. Each gap causes a real, easy-to-hit failure in practice.Gap 1: No section for
rubric_based_multi_turn_trajectory_quality_v1The metric exists upstream — registered in
metric_evaluator_registry.py(PrebuiltMetrics.RUBRIC_BASED_MULTI_TURN_TRAJECTORY_QUALITY_V1) and implemented inrubric_based_multi_turn_trajectory_evaluator.py— but the corresponding section incriteria.mdis missing. The other two rubric-based metrics (rubric_based_final_response_quality_v1,rubric_based_tool_use_quality_v1) have full sections withEvalConfigexamples; this one doesn't.User impact: someone trying to evaluate multi-turn trajectory quality with custom rubrics has no guidance, and falls back to copying the single-turn section, which uses subtly different semantics.
Gap 2:
Rubric.typefield semantics aren't documentedEvery
RubricBasedEvaluatorsubclass declares aRUBRIC_TYPEClassVar that acts as a filter — only rubrics whoseRubric.typematches that value are picked up:Rubric.type(from source)rubric_based_final_response_quality_v1FINAL_RESPONSE_QUALITYrubric_based_tool_use_quality_v1TOOL_USE_QUALITYrubric_based_multi_turn_trajectory_quality_v1TRAJECTORY_QUALITYNone of these values appear anywhere in
criteria.md(or anywhere else in adk-docs based on a code search). The existing JSON examples incriteria.mddon't set thetypefield either, so users following the examples and then attaching per-case rubrics viaEvalCase.rubricsend up withtype=Nonerubrics that the evaluator silently drops.User impact: rubric scores come back empty /
Nonewith no clear error — only aRubric ... not found in the rubrics provided to the metric.warning if the user happens to be reading log output closely.Gap 3:
criterion.rubricsis required,EvalCase.rubricsis additive — neither is statedRubricBasedEvaluator.__init__asserts thatcriterion.rubricsis non-empty (rubric_based_evaluator.py:332).create_effective_rubrics_listthen unions criterion-level rubrics with per-case rubrics (filtered bytype). The docs don't mention either fact:criteria.mdexamples put all rubrics on the criterion, so the assert never fires for users who copy them verbatim.EvalCaseentries (a natural pattern when each case has case-specific rubrics) and leavecriterion.rubrics = []hitAssertionError: Rubrics are required.at init with no docs-level explanation of why.User impact: confusing assert at init, or unexpected "extra" rubrics being scored per case.
Proposed direction
A single docs PR that:
## rubric_based_multi_turn_trajectory_quality_v1section tocriteria.md, modeled on the existingrubric_based_final_response_quality_v1section (with aTRAJECTORY_QUALITYexample), and the corresponding row to the criteria table at the top of the page.Rubric.typeand the criterion-vs-EvalCase rubric relationship.typefield to the existing JSON examples in the two existing rubric-based metric sections so users have a complete template.No code changes —
docs/evaluate/criteria.mdonly.Filing this as an issue first per the CONTRIBUTING guideline for "New Documentation". Happy to follow up with a draft PR for review if maintainers think the direction is right.