Datadog MCP: AI-Powered Alert Triage and Dashboard Queries (Bits AI Setup)
Nikhil Tiwari
MCP Playground
๐ถ MCP Recipe
- What you'll build: An on-call AI agent that triages alerts, queries metrics, searches logs and pulls APM traces from your Datadog account
- MCP server: Official Datadog Bits AI MCP (GA March 2026), hosted at
mcp.datadoghq.com - Time to complete: 5 minutes
- Difficulty: Beginner-friendly โ no install, no self-hosting
Datadog shipped the Bits AI MCP server to GA in March 2026 โ a hosted, remote MCP endpoint that exposes APM, logs, metrics, monitors, dashboards, security signals and LLM Observability as MCP tools. Unlike most MCP servers you've seen, this one needs no install: it's a Streamable HTTP endpoint at mcp.datadoghq.com that any MCP client can connect to with two API headers.
This recipe shows how to wire Claude, GPT-5, Gemini or any other model up to Datadog in 5 minutes, then walks through the four queries that actually save on-call time: alert triage, metric trend analysis, log pattern search, and APM trace investigation.
What the Datadog Bits AI MCP Provides
The server exposes Datadog's product surface as toolsets โ grouped collections of tools you can enable per request. Key ones:
| Toolset | What it covers |
|---|---|
| monitors | List, search and inspect monitors. Surface active alerts, mute/unmute, read alert messages and notification settings. |
| metrics | Query timeseries and gauges with full DogStatsD syntax. Aggregations, rollups, group-bys. |
| logs | Full-text and structured log search. Filter by service, host, trace ID, status, time range. |
| apm | Pull trace details, list slow spans, walk service maps. Bottleneck analysis on a per-trace basis. |
| dashboards | Read dashboard definitions, query the underlying widgets, summarise current state. |
| incidents | List and read Datadog incidents, including timeline and affected services. |
| security | Read Cloud SIEM signals, posture findings, runtime security alerts. |
| llm_observability | Query LLM traces โ token usage, latency by model, prompt/completion samples. |
Default behaviour exposes a focused subset. Append ?toolsets=all to the URL to enable everything, or specific ones like ?toolsets=monitors,logs,apm for a tight on-call config.
Prerequisites
Any plan with API access (most do)
Identifies your org. Org Settings โ API Keys
Identifies the user. Org Settings โ Application Keys
Step 1: Generate Your Two Keys
Datadog uses a two-key model: the API key identifies your organization, the Application key identifies the user (and carries that user's permissions). You need both.
- API Key: app.datadoghq.com โ Organization Settings โ API Keys โ create a new key (or reuse an existing one). Copy it.
- Application Key: app.datadoghq.com โ Organization Settings โ Application Keys โ create a new key. Pick the scopes you need โ read-only is enough for triage; add write scopes only if you want the agent to mute monitors or comment on incidents.
Scopes matter
An Application Key inherits the permissions of the user who created it. For a triage-only agent, create a dedicated user with read-only roles and generate the App Key under that user. Don't use a key tied to an admin account unless the agent really needs write access.
Step 2: Connect from MCP Agent Studio
Open the pre-built Datadog Agent template โ it's wired up with the right URL, both header fields, and a system prompt tuned for the on-call workflow.
- Visit /templates/datadog-agent and click Open in Studio.
- Paste your API key into the DD-API-KEY field, your Application key into DD-APPLICATION-KEY.
- If you're on EU/US3/US5/AP1/AP2, change the URL subdomain (e.g.
mcp.datadoghq.eu/api/unstable/mcp-server/mcp). - Send: "Which monitors are alerting right now?"
Or, in Claude Code:
claude mcp add --transport http datadog \
https://mcp.datadoghq.com/api/unstable/mcp-server/mcp \
--header "DD-API-KEY: ${DD_API_KEY}" \
--header "DD-APPLICATION-KEY: ${DD_APPLICATION_KEY}"
Step 3: Four On-Call Workflows That Actually Save Time
1. Morning alert triage
Prompt
"Stand-up summary: every monitor that fired between 10pm and 8am, grouped by service. For each one, tell me the alert message and whether it auto-recovered or is still firing."
The agent calls list_monitors with state filters, groups by tag, and outputs a digest. Replaces the 15 minutes you spend clicking through the alerts feed before stand-up.
2. "Is this latency spike real?"
Prompt
"p95 latency on the checkout service for the last 6 hours, in 5-minute buckets. Compare to the same window yesterday. Is it actually spiking or is this normal noise?"
Calls query_metrics twice (now vs yesterday), computes the delta, returns a one-paragraph verdict. The model is good at "is this signal or noise" because it can read the variance, not just the point value.
3. Log pattern search
Prompt
"Search logs for 'stripe webhook failed' across all services in the last 2 hours. Group by error message and surface the top 5 patterns with their counts."
The agent doesn't dump raw logs โ it summarises patterns and counts. The default system prompt in the template explicitly forbids returning more than ~20 raw log lines unless you ask for them.
4. APM trace bottleneck
Prompt
"Pull the slowest 10 traces for POST /api/orders in the last hour. What do they have in common โ same downstream service, same DB query, same customer?"
Calls list_traces, ranks by duration, walks each trace's spans. Looks for a common bottleneck โ usually a slow DB call or a downstream service. Beats clicking through 10 trace flame graphs by hand.
Picking the Right Model
Datadog tools return a lot of structured data โ pick a model that handles long contexts well and reasons over numbers cleanly.
| Model | When to pick it |
|---|---|
| Claude Sonnet 4.5 | Default for triage. Good at "is this signal or noise" reasoning over metric data. |
| Claude Opus 4.7 | When you need correlation across 3+ services or a deep trace investigation. Best at synthesizing many tool-call results. |
| GPT-5.4 | Very strong on log pattern extraction. Tends to be more verbose than Claude โ set a max-words instruction in your prompt. |
| Gemini 3.1 Pro | 1M context โ ideal for "give me everything from the last 24 hours and find the anomaly" prompts. |
Regions and the URL
Datadog runs in 6 regions. Match the URL to your site:
| Region | MCP URL |
|---|---|
| US1 (default) | https://mcp.datadoghq.com/api/unstable/mcp-server/mcp |
| US3 | https://mcp.us3.datadoghq.com/api/unstable/mcp-server/mcp |
| US5 | https://mcp.us5.datadoghq.com/api/unstable/mcp-server/mcp |
| EU | https://mcp.datadoghq.eu/api/unstable/mcp-server/mcp |
| AP1 | https://mcp.ap1.datadoghq.com/api/unstable/mcp-server/mcp |
| AP2 | https://mcp.ap2.datadoghq.com/api/unstable/mcp-server/mcp |
You can also append ?toolsets=all or ?toolsets=monitors,logs,apm to scope which tools the agent sees.
Production Notes
- Use a dedicated read-only user for the App Key. Application Keys inherit user permissions.
- The endpoint is rate-limited per the standard Datadog API limits. For high-frequency batch jobs, throttle on your side.
- OAuth is supported as an alternative to the two-header pattern. For most chat-style integrations, headers are simpler โ switch to OAuth when you need per-user scoping in a multi-tenant app.
- Never paste an Application Key into a public chat or log it. Treat it like a password โ anyone with the key acts as that Datadog user.
- Cap log result sizes in your prompts. Returning thousands of log lines into the model's context is expensive and rarely useful โ ask for summaries and counts first.
Try the Datadog Agent in your browser
Pre-built template, hosted MCP โ no install. Triage alerts, query metrics, search logs and walk APM traces from chat.
Open Datadog Agent โRelated Recipes
- MongoDB MCP: Natural-Language Queries with AI
- PostgreSQL MCP: Build a Claude Analytics Agent
- Best MCP Servers in 2026
Frequently Asked Questions
Is the Datadog MCP server hosted, or do I need to run it myself?
mcp.datadoghq.com. You don't install anything โ just point your MCP client at the URL with your two API headers. This is one of the few major SaaS MCPs (alongside Linear, Notion, Vercel, Supabase) with a managed endpoint.What's the difference between the API key and the Application key?
Can I scope which tools the agent has access to?
toolsets query parameter to the URL: ?toolsets=monitors,logs,apm for a tight on-call config, or ?toolsets=all for the kitchen sink. Default is a focused subset. Scoping reduces the schema sent to the model, which is faster and cheaper.Does it work with EU, US3, US5, AP1, AP2 sites?
mcp.datadoghq.eu, US3 is mcp.us3.datadoghq.com, etc. The path stays the same: /api/unstable/mcp-server/mcp.Can the agent mute monitors or perform write actions?
Written by Nikhil Tiwari
15+ years in product development. AI enthusiast building developer tools that make complex technologies accessible to everyone.
Free MCP Tools (no install)
Test any MCP server with 30+ AI models โ free
Connect any MCP endpoint and chat with Claude, GPT-5, Gemini, DeepSeek and more. Watch every tool call live.
โฆ Free credits on sign-up ยท no credit card needed