Datadog's official Bits AI MCP server (GA March 2026) exposes APM traces, logs, metrics, monitors, dashboards and security signals as MCP tools. This template wires it up with a system prompt tuned for the on-call workflow — alert triage, dashboard-driven investigations, log searches and metric queries — so you can answer "what is broken right now?" without leaving chat.
Default model
Claude Sonnet 4.5
MCP servers
mcp.datadoghq.com
Auth
Datadog API key + Application key (Organization Settings → API Keys / Application Keys)
A few things this template does well out of the box.
Three steps to go from template to a live chat.
Click "Use this template"
Agent Studio opens with the MCP server, model and system prompt pre-filled.
Add your access token
Datadog API key + Application key (Organization Settings → API Keys / Application Keys)
Start chatting
Ask a question, watch live tool calls and switch models at any time to compare answers.
The endpoints this template connects to by default. You can swap any of them in Agent Studio.
https://mcp.datadoghq.com/api/unstable/mcp-server/mcp
mcp.datadoghq.com
A quick walkthrough for the credential this template needs.
Copy one into the studio to see the agent in action.
Which monitors are alerting right now? Group them by service and show me the alert message for each.
p95 latency on the `checkout` service for the last 6 hours — is it trending up?
Search logs for "stripe webhook failed" in the last 2 hours and summarise the error patterns.
Pull the slowest 10 traces for `POST /api/orders` in the last hour and tell me what they have in common.
Build me a stand-up summary: every alert that fired between 10pm and 8am, the service it hit, and whether it auto-recovered.
The default instructions the model starts with. Edit it any time inside Agent Studio.
You are a senior site-reliability engineer connected to Datadog via the official Bits AI MCP server. You help on-call engineers triage alerts and investigate production issues. Use the available tools to: - List active monitors and alerts; for each, surface the service, the metric/condition, and the most recent state change - Query metrics (timeseries, gauges, distributions) over a user-specified window — summarise trends, not raw numbers - Search logs by query string, service, trace ID or time range; surface unique error patterns instead of dumping every line - Pull APM trace details and explain bottlenecks: slowest spans, downstream services, database calls - Read dashboards and incident details when the user asks for a higher-level view Operating principles: - Lead with the answer (e.g. "checkout p95 has spiked from 220ms to 480ms in the last 30 minutes"), then back it up with the evidence (the metric, the time range, the affected hosts) - When triaging multiple alerts, group them by likely root cause — don't just enumerate - For log searches, return summaries and counts before raw log lines - Never recommend a destructive mitigation (mute monitor, restart service) without explicit confirmation from the user - If a query needs a tag or service name you don't have, ask — don't guess
Open Agent Studio with this template pre-loaded. Add your token, pick any model, and start chatting.
Use this templateStripe Billing Assistant
Query customers, subscriptions, invoices and payment events without leaving your chat.
View template →Revenue Ops · Stripe, Linear & Slack
Correlate Stripe billing events with Linear bug reports and alert the team on Slack when revenue metrics change.
View template →Neon Data Ops
Manage your Neon Postgres database and stream query results and alerts directly to your Slack workspace.
View template →