Skip to content

Add MLflow evaluation scorer documentation for ADK evaluators #5123

@debu-sinha

Description

@debu-sinha

🔴 Required Information

Is your feature request related to a specific problem?

ADK's evaluation criteria (ToolTrajectory, ResponseMatch, Hallucinations, etc.) are powerful but users who also use MLflow for experiment tracking have no way to run ADK evaluators through mlflow.genai.evaluate(). MLflow already has ADK tracing integration but no evaluation scorer integration.

Describe the Solution You'd Like

Add a documentation page to ADK docs showing how to use ADK evaluators as MLflow scorers. An MLflow integration has been submitted as PR mlflow/mlflow#22299 that wraps ADK's TrajectoryEvaluator and RougeEvaluator as MLflow third-party scorers.

Example usage:

from mlflow.genai.scorers.google_adk import ToolTrajectory, ResponseMatch

results = mlflow.genai.evaluate(
    data=eval_dataset,
    scorers=[
        ToolTrajectory(match_type="EXACT", threshold=0.5),
        ResponseMatch(threshold=0.6),
    ],
)

A docs page under docs/evaluate/ or docs/integrations/ showing this integration would help ADK users who track experiments with MLflow.

Impact on your work

This enables ADK users to evaluate agents through MLflow's unified evaluation pipeline, combining ADK's deterministic evaluators with MLflow's experiment tracking, tracing, and comparison tools.

Willingness to contribute

Yes. Happy to submit a docs PR if the team approves the direction.


🟡 Recommended Information

Describe Alternatives You've Considered

Users can manually create ADK Invocation objects and run evaluators outside MLflow, but this breaks the unified mlflow.genai.evaluate() workflow and loses integration with MLflow's experiment tracking.

Additional Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentation[Component] This issue is related to documentation, it will be transferred to adk-docs

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions