MCP ServerSTDIOOfficialv1.3.1

multimodal MCP Server

Multi-provider media generation — images, video, audio, and transcription via a unified interface

io.github.rsmdt/multimodal

Hosted URL

https://github.com/rsmdt/multimodal-mcp

Transport

STDIO

Auth

No auth required

Connect to multimodal

Hosted endpoint — paste into any MCP client.

https://github.com/rsmdt/multimodal-mcp

Environment variables

Configuration this server reads at startup.

  • OPENAI_API_KEYSecret

    OpenAI API key for image, video, audio generation and transcription

  • XAI_API_KEYSecret

    xAI API key for image and video generation

  • GEMINI_API_KEYSecret

    Google Gemini API key for image, video, and audio generation

  • ELEVENLABS_API_KEYSecret

    ElevenLabs API key for audio generation and transcription

  • BFL_API_KEYSecret

    BFL API key for FLUX image generation and editing

  • MEDIA_OUTPUT_DIR

    Directory for saved media files (defaults to cwd)

Resources

Where to find authoritative docs and source for multimodal.

Try multimodal with 30+ AI models

Open MCP Agent Studio and connect this server to Claude, GPT, Gemini, DeepSeek and more — no install required.

Open Agent Studio

Related servers

More on MCP Playground