Cook Your Music in 3 Steps
Describe, Upload, or Both
Type a description, upload an image, drop in a video, or feed it an audio clip. The Music Agent accepts all of these as creative input. Or combine them. Describe a mood and attach a reference photo. The agent works with whatever you give it.
The Agent Interprets and Composes
The AI reads your input: text, visual content, audio characteristics, or all of the above. It identifies mood, energy, style cues, and pacing, then composes a track using your selected Suno model (V4 through V5). Pick the model from the selector before you start.
Start Cooking and Download
Hit Start Cooking. Listen to the result, download the track, or start a new round with different inputs. Your generation history is saved so you can revisit and iterate on past sessions anytime.
One Input, Endless Musical Possibilities
Start with whatever you have. A text description, a photo from your camera roll, a video clip from your project timeline, or an audio reference from a voice memo. The Music Agent accepts all of them as creative starting points. You do not need to know the right musical terminology. You do not need a structured prompt. Just give the agent something to work with and it handles the rest.

Turn Any File into a Song
Upload a photo and the AI reads the visual mood to compose matching music. Drop in a video and it scores a soundtrack based on the pacing and atmosphere. Feed it an audio clip (32 seconds to 8 minutes) and it builds a new track inspired by the reference. Images up to 50MB, videos up to 100MB. JPG, PNG, WEBP, MP4, MOV, MPEG, MP3, M4A. The Music Agent eats them all.

Why Creators Pick the AI Music Agent
What makes our AI Music Agent the most flexible way to create music from any starting point.
Any Input Works: Text, Image, Video, Audio
Text, images, videos, and audio clips all work as input. Upload a photo of a sunset and get ambient music. Drop in a video and get a matching score. No other AI music tool accepts all four input types.
Chat-Style Interface, Zero Friction
No forms, no dropdowns, no structured fields. Type naturally, attach files, and hit Start Cooking. The agent understands what you are going for without requiring musical terminology or formatted prompts.
Four AI Quality Tiers Built In
Same four AI quality tiers as the Music Generator. Our top-tier model for peak creativity, our recommended model for vocal quality, our balanced model for speed, our standard model for stability. Switch anytime with one tap.
Prompt Suggestions When You Are Stuck
Rotating prompt suggestions at the bottom of the interface give you starting points when the blank page feels intimidating. Use them as-is or as a launch pad for your own ideas.
Full History for Every Session
Every session is saved. Come back to a previous generation, tweak the input, and run it again. Build on what worked instead of starting fresh every time.
Connected to the Full SunoPrompt Ecosystem
The Music Agent plugs into the rest of SunoPrompt. Use prompts from the Prompt Generator. Split finished tracks in the Vocal Remover. Move between tools without leaving the platform.
The Most Creative Tool in the Kit
Music Agent Meets the Full SunoPrompt Toolkit
The Music Agent is the most flexible tool in SunoPrompt's ecosystem. It connects with everything: prompts you built in the Prompt Generator, tracks you made in the Music Generator, and stems you split in the Vocal Remover.
AI Music Agent
The conversational, multimodal music creator. Feed it text, images, videos, or audio and get back a fully produced track. Four AI music quality tiers with model switching built in.
AI Music Generator
Prefer a structured approach? The AI Music Generator gives you form-based controls: description, title, genre, voice gender, instrumental toggle. Four AI quality tiers. Two modes: Text to Music and Lyrics to Music.
Prompt Generator & Vocal Remover
Craft detailed prompts with the Prompt Generator, then feed them to the Agent or the Music Generator. Or split finished tracks into stems with the Vocal Remover. Every tool connects.

Explore More
Who Uses the AI Music Agent
For Musicians and Producers
Upload a voice memo of a melody you hummed and let the agent build a full arrangement around it. Go from a 30-second idea to a produced track without opening a DAW.
Feed it a reference track and describe what you want to change. The agent uses the audio as a starting point and composes something in the same neighborhood but fully original.
Switch between our four AI quality tiers to hear how different models interpret the same input. Our top-tier model takes more creative risks. Our recommended model focuses on vocal polish. Our standard model plays it safe.
What is an AI Music Agent?
An AI Music Agent is a conversational tool that turns text descriptions, images, videos, and audio clips into original music. Upload anything, describe a feeling, and let the AI compose a track that matches.
A Creative Partner, Not a Form
An AI Music Agent is a conversational creative partner that turns any input into music. Unlike traditional form-based generators, the agent works through a chat-like interface. You talk to it naturally. Describe what you want, upload reference files, or do both at once. The agent interprets your intent and composes a matching track.
Text, Image, Video, and Audio: All Valid Inputs
This is what makes the Music Agent truly different. Feed it a sunset photo and get warm ambient music. Drop in a street racing video and get high-energy electronic. Upload a voice memo and let it build a full arrangement around it. Text, images, video, and audio all work as creative input. No other tool in the AI music space handles all four.
Built-In Inspiration When You Need It
The agent does not just take an order. It can suggest starting points when you are stuck. Prompt suggestions rotate at the bottom of the interface: 'Cult Aesthetic Supreme', 'Hypnotic ASMR Soundscapes', 'pop country ballad about lost love.' Use them as-is or as a jumping-off point for your own ideas.
The Future of Music Creation is Multimodal
As AI music models evolve, the ability to start from any creative input becomes more powerful. Today you can upload a photo and get a song. Soon you will be able to chain multiple steps: describe a concept, generate a track, adjust with follow-up instructions, and export. The agent model is where music creation is heading.
What Sets the Music Agent Apart
The Music Agent is the only tool in SunoPrompt's ecosystem that accepts images, videos, and audio as creative input alongside text. Most AI music tools are text-only. The Agent reads visual mood from photos, extracts energy and pacing from video clips, and uses audio references to match style. That multimodal capability opens up creative paths that text alone cannot reach.
The conversational interface changes how you interact with AI music. Instead of filling out forms and clicking dropdowns, you talk to the agent like a collaborator. Describe what you want, attach a file, and hit Start Cooking. Prompt suggestions rotate at the bottom when you need a spark. History saves every session. Four AI music quality tiers are one tap away. The whole experience is designed to feel less like a tool and more like working with someone who gets what you are going for.