BoF: "Testing AI Agents: Live Eval Session"
A prompt change that improves one answer can silently break ten others. Come run live evaluations against real Drupal AI agents, see what grader output actually tells you, and help define the eval datasets the community needs.
