-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Add MLflow evaluation scorer documentation for ADK evaluators #5123
Description
🔴 Required Information
Is your feature request related to a specific problem?
ADK's evaluation criteria (ToolTrajectory, ResponseMatch, Hallucinations, etc.) are powerful but users who also use MLflow for experiment tracking have no way to run ADK evaluators through mlflow.genai.evaluate(). MLflow already has ADK tracing integration but no evaluation scorer integration.
Describe the Solution You'd Like
Add a documentation page to ADK docs showing how to use ADK evaluators as MLflow scorers. An MLflow integration has been submitted as PR mlflow/mlflow#22299 that wraps ADK's TrajectoryEvaluator and RougeEvaluator as MLflow third-party scorers.
Example usage:
from mlflow.genai.scorers.google_adk import ToolTrajectory, ResponseMatch
results = mlflow.genai.evaluate(
data=eval_dataset,
scorers=[
ToolTrajectory(match_type="EXACT", threshold=0.5),
ResponseMatch(threshold=0.6),
],
)A docs page under docs/evaluate/ or docs/integrations/ showing this integration would help ADK users who track experiments with MLflow.
Impact on your work
This enables ADK users to evaluate agents through MLflow's unified evaluation pipeline, combining ADK's deterministic evaluators with MLflow's experiment tracking, tracing, and comparison tools.
Willingness to contribute
Yes. Happy to submit a docs PR if the team approves the direction.
🟡 Recommended Information
Describe Alternatives You've Considered
Users can manually create ADK Invocation objects and run evaluators outside MLflow, but this breaks the unified mlflow.genai.evaluate() workflow and loses integration with MLflow's experiment tracking.
Additional Context
- ADK already has MLflow tracing integration: https://google.github.io/adk-docs/integrations/mlflow/
- MLflow PR adding ADK scorers: Add Google ADK and third-party scorers mlflow/mlflow#22299
- MLflow issue: [FR] Add Google ADK third-party scorer integration mlflow/mlflow#22297
- This follows the pattern of other MLflow third-party scorer integrations (Phoenix/Arize, TruLens/Snowflake, Guardrails AI)