Multi-provider media generation — images, video, audio, and transcription via a unified interface
io.github.rsmdt/multimodal
https://github.com/rsmdt/multimodal-mcp
STDIO
No auth required
Hosted endpoint — paste into any MCP client.
Configuration this server reads at startup.
OpenAI API key for image, video, audio generation and transcription
xAI API key for image and video generation
Google Gemini API key for image, video, and audio generation
ElevenLabs API key for audio generation and transcription
BFL API key for FLUX image generation and editing
Directory for saved media files (defaults to cwd)
Where to find authoritative docs and source for multimodal.
Open MCP Agent Studio and connect this server to Claude, GPT, Gemini, DeepSeek and more — no install required.
Open Agent Studio