Regression testing for AI agents. Golden baselines, CI/CD, LangGraph, CrewAI, OpenAI, Claude.
io.github.hidai25/evalview-mcp
https://github.com/hidai25/eval-view
STDIO
No auth required
How models use it and what it is built for.
Regression testing for AI agents. Golden baselines, CI/CD, LangGraph, CrewAI, OpenAI, Claude.
Hosted endpoint — paste into any MCP client.
Configuration this server reads at startup.
OpenAI API key for LLM-as-judge output quality scoring. Optional — deterministic tool/sequence evaluation works without it.
Where to find authoritative docs and source for evalview-mcp.
MCP Playground runs 10,000+ hosted MCP servers — GitHub, Linear, Notion, Stripe, Sentry and more — across Claude, GPT, Gemini, DeepSeek and 30+ AI models. Compare model answers side-by-side, save agent presets, share runs. Zero install.
Open Agent Studio