Skip to content

docs(evaluate): missing section for rubric_based_multi_turn_trajectory_quality_v1 + Rubric.type / criterion.rubrics requirements #1852

Description

@zettaittenani

Problem

docs/evaluate/criteria.md is missing documentation for the rubric-based multi-turn evaluator, and the implicit requirements around Rubric.type and criterion-level rubrics aren't explained for any of the rubric-based criteria. Each gap causes a real, easy-to-hit failure in practice.

Gap 1: No section for rubric_based_multi_turn_trajectory_quality_v1

The metric exists upstream — registered in metric_evaluator_registry.py (PrebuiltMetrics.RUBRIC_BASED_MULTI_TURN_TRAJECTORY_QUALITY_V1) and implemented in rubric_based_multi_turn_trajectory_evaluator.py — but the corresponding section in criteria.md is missing. The other two rubric-based metrics (rubric_based_final_response_quality_v1, rubric_based_tool_use_quality_v1) have full sections with EvalConfig examples; this one doesn't.

User impact: someone trying to evaluate multi-turn trajectory quality with custom rubrics has no guidance, and falls back to copying the single-turn section, which uses subtly different semantics.

Gap 2: Rubric.type field semantics aren't documented

Every RubricBasedEvaluator subclass declares a RUBRIC_TYPE ClassVar that acts as a filter — only rubrics whose Rubric.type matches that value are picked up:

Metric Required Rubric.type (from source)
rubric_based_final_response_quality_v1 FINAL_RESPONSE_QUALITY
rubric_based_tool_use_quality_v1 TOOL_USE_QUALITY
rubric_based_multi_turn_trajectory_quality_v1 TRAJECTORY_QUALITY

None of these values appear anywhere in criteria.md (or anywhere else in adk-docs based on a code search). The existing JSON examples in criteria.md don't set the type field either, so users following the examples and then attaching per-case rubrics via EvalCase.rubrics end up with type=None rubrics that the evaluator silently drops.

User impact: rubric scores come back empty / None with no clear error — only a Rubric ... not found in the rubrics provided to the metric. warning if the user happens to be reading log output closely.

Gap 3: criterion.rubrics is required, EvalCase.rubrics is additive — neither is stated

RubricBasedEvaluator.__init__ asserts that criterion.rubrics is non-empty (rubric_based_evaluator.py:332). create_effective_rubrics_list then unions criterion-level rubrics with per-case rubrics (filtered by type). The docs don't mention either fact:

  • The current criteria.md examples put all rubrics on the criterion, so the assert never fires for users who copy them verbatim.
  • But users who try to attach rubrics on individual EvalCase entries (a natural pattern when each case has case-specific rubrics) and leave criterion.rubrics = [] hit AssertionError: Rubrics are required. at init with no docs-level explanation of why.
  • Similarly, users who put rubrics on both expect criterion rubrics to be overridden per case; they're actually unioned.

User impact: confusing assert at init, or unexpected "extra" rubrics being scored per case.

Proposed direction

A single docs PR that:

  1. Adds a ## rubric_based_multi_turn_trajectory_quality_v1 section to criteria.md, modeled on the existing rubric_based_final_response_quality_v1 section (with a TRAJECTORY_QUALITY example), and the corresponding row to the criteria table at the top of the page.
  2. Adds a small "Notes On Rubrics" subsection under each of the three rubric-based metric sections that states the required Rubric.type and the criterion-vs-EvalCase rubric relationship.
  3. Adds the explicit type field to the existing JSON examples in the two existing rubric-based metric sections so users have a complete template.

No code changes — docs/evaluate/criteria.md only.

Filing this as an issue first per the CONTRIBUTING guideline for "New Documentation". Happy to follow up with a draft PR for review if maintainers think the direction is right.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions