May 1, 2026·12 min read·Tutorial

How to Write AI Song Prompts That Actually Produce Great Music

After 500+ generations, here is the prompt formula that works — the four-part structure, copy-paste templates by genre, and the words that wreck a prompt.

I have generated more than 500 AI songs across the last six months. Maybe 60% of them landed on the first try. The other 40% missed because of the prompt — never the model.

The pattern is consistent enough that I now write prompts to a fixed formula. This article is that formula, plus the copy-paste templates I keep in my notes app for each major genre. None of this is theory — it is what actually works on Muziko, Suno, and Udio in 2026.

Quick context: I am writing this for someone who has already made one or two AI songs and is wondering why some prompts produce something usable and others produce mush. If you are brand new, start with the 3-minute walkthrough and come back here once you have generated a few tracks.

The four-part prompt formula

Every prompt that has reliably produced good output for me follows the same four-part structure in one or two sentences:

Genre — specific, not vague. "Indie pop" beats "pop." "Boom bap hip hop" beats "hip hop."
Vocals — gender, count, or instrumental. "Female vocals." "Male vocals with female backing." "Instrumental only."
Two instruments or sonic details — what the listener will actually hear. "Bright acoustic guitar with handclaps." "Distorted bass and 808 hi-hats."
One mood or scene — emotional context the model can interpret. "Late-night studying." "Driving home alone after a fight."

Optional fifth part: tempo. Useful when you want a specific energy. "Around 95 bpm" or "fast and energetic, 140 bpm."

A prompt that hits this structure:

"Modern indie folk, female vocals, fingerpicked acoustic guitar with light strings, autumn evening walk through the city, around 80 bpm."

A prompt that does not:

"A nice song that feels emotional and powerful."

The first prompt gives the model 80% of the decisions. The second leaves it guessing about everything.

Hands typing on iPhone showing music app prompt input field with text being entered, natural daylight, deep violet UI accents

The five words that wreck most prompts

Some words I see constantly that produce worse output than no word at all:

"Beautiful" — the model has no idea what beautiful means. Use specific descriptors like "warm," "delicate," or "lush" instead.
"Epic" — overused, trains models toward generic cinematic. Replace with the actual reference: "soundtrack-style strings" or "heavy drums and brass."
"Catchy" — meaningless to a model. Describe the structure you actually want: "strong chorus hook," "repeating four-bar refrain."
"Vibe" — too vague unless paired with something concrete. "Lo-fi vibe" is fine; "good vibe" is not.
"Banger" — does not translate to musical descriptors. Replace with energy markers: "loud," "driving," "fast 130 bpm."

Cutting these five words from your vocabulary will improve your hit rate before you change anything else.

Genre-by-genre copy-paste templates

Here are the prompt templates I actually use. Each one has a working structure with placeholder details you can swap.

Indie pop / bedroom pop

"Bedroom pop, [female/male] vocals, [bright/mellow] electric guitar with [synth pads/handclaps], [summer afternoon/late-night drive], around [110-130] bpm."

Working example: "Bedroom pop, female vocals, mellow electric guitar with subtle synth pads, late-night drive home from a party, around 115 bpm."

The bedroom pop genre leans heavily on warm vocal production and laid-back rhythm. The model handles this category well because the conventions are consistent.

Lo-fi hip hop

"Lo-fi hip hop instrumental, [Rhodes piano/jazz guitar/dusty samples], [warm tape hiss/vinyl crackle], [studying alone/rainy afternoon], around 80 bpm."

Working example: "Lo-fi hip hop instrumental, mellow Rhodes piano with light jazz guitar, warm tape hiss, studying alone at 2am, around 80 bpm."

Lo-fi is one of the easiest genres to nail. The genre conventions are strong, the tempo range is narrow, and the model has been trained on a lot of it. Almost any prompt with "lo-fi" in it produces something usable.

Cozy bedroom workspace with vinyl records on shelf, warm lamp light, iPhone showing lo-fi music app, golden hour light

Modern pop ballad

"Modern pop ballad, [male/female] vocals, [piano-led/sparse acoustic guitar], lyrics about [specific scene], emotional and intimate, around 70 bpm."

Working example: "Modern pop ballad, male vocals, piano-led with subtle strings entering at the chorus, lyrics about leaving a small town for the first time, emotional and intimate, around 72 bpm."

The "lyrics about [specific scene]" framing is more powerful than describing the emotion. "Lyrics about driving past your old high school" produces a more specific song than "lyrics about nostalgia."

Country (modern)

"Modern country, [male/female] vocals, [acoustic guitar/slide guitar/mandolin], [pickup truck on a Texas highway/Friday night in a small town], around [100-115] bpm."

Working example: "Modern country, male vocals, slide guitar and mandolin, pickup truck on a Texas highway at sunset, around 105 bpm."

The geography reference matters more than the genre tag. Country songs about Texas sound different than country songs about Tennessee, and the model picks up on this.

Hip hop / boom bap

"Boom bap hip hop, male vocals, [dusty piano sample/jazz horn loop], [boom bap drums with sharp snares], [90s NYC street/late-night corner store], around [85-95] bpm."

Working example: "Boom bap hip hop, male vocals, dusty piano sample with jazz horn loop, boom bap drums with sharp snares, 90s NYC street energy, around 92 bpm."

The era reference ("90s") is one of the most powerful single words you can use in a hip hop prompt. It anchors the production style harder than any instrument descriptor.

Electronic / EDM

"[Deep house/melodic techno/drum & bass], [vocal hook/instrumental], [warm pads/dark atmospheric synths], [club energy/festival main stage/late-night drive], [120/128/174] bpm."

Working example: "Melodic techno, female vocal hook, warm pads and driving bass, late-night drive through empty city streets, 124 bpm."

EDM is the genre where tempo matters most. Always specify the BPM — "128 bpm" gets you house, "140 bpm" gets you trance, "174 bpm" gets you drum & bass. Skipping the BPM in an EDM prompt is the single most common reason the genre lands wrong.

Acoustic singer-songwriter

"Acoustic singer-songwriter, [male/female] vocals, [fingerpicked guitar/strummed guitar with light percussion], [coffee shop afternoon/cabin in the woods], intimate and stripped-back, around [75-95] bpm."

Working example: "Acoustic singer-songwriter, female vocals, fingerpicked guitar with light percussion, coffee shop on a rainy afternoon, intimate and stripped-back, around 85 bpm."

For acoustic, the "intimate and stripped-back" phrasing pulls the production away from over-polished and into demo-like warmth, which is usually what people want from this genre.

How to fix a prompt that missed

When the first generation lands wrong, the right move is to change one specific thing, not start over.

Person at kitchen table with iPhone and notebook with prompt revisions crossed out, morning window light, deep violet UI accents

The diagnostic flow:

Wrong genre → add a more specific subgenre tag in the prompt and pair with the genre tile.
Wrong tempo or energy → add or change the BPM. "Slow ballad" → "around 70 bpm slow ballad."
Vocals feel generic → switch vocal gender, or add a vocal style descriptor: "raspy male vocals," "airy female vocals," "deep baritone vocals."
Production sounds dated → add a year or production reference: "modern 2025 indie production," "80s synth-pop production with gated drums."
Mood is off → swap the scene for something more specific. "Happy" → "summer road trip with the windows down."

Two regenerations after a one-word change is usually enough. If you are at four regenerations, the prompt structure itself is broken. Rewrite the whole sentence using the four-part formula.

I covered the regeneration loop in more detail in the 3-minute tutorial — the diagnostic step is the same, the loop is faster than you think.

Three prompt patterns that always work

After 500 generations, three patterns produce above-90% hit rates for me:

Genre + scene + tempo. "Modern indie folk, female vocals, fingerpicked guitar, autumn evening walk through Brooklyn, 80 bpm."
Era + genre + production reference. "90s alternative rock, male vocals, distorted guitars and live drums, modern 2025 production, 130 bpm."
Story + emotion + voice. "Pop ballad, male vocals, piano-led, lyrics about visiting your dad's grave for the first time, intimate and devastating."

If you are stuck, copy one of these three and swap the placeholders. They are not magic, but they cover roughly 80% of the songs anyone wants to make.

Open leather notebook with bullet-point handwritten prompt template structure, fountain pen, warm desk lamp light, gold embossing on cover

For more depth on what makes a prompt land, the original prompts that work guide has more genre coverage. And if you want to see the full breakdown of how AI music models actually translate text into sound, Wikipedia's AI music generation entry is a good technical primer.

Try this exact prompt right now

Open Muziko on iPhone and paste this into the Describe field, then tap Indie Pop and Joyful:

"Modern indie pop, female vocals, bright acoustic guitar with handclaps and subtle synth pads, summer road trip with the windows down, around 118 bpm."

This is the prompt I use as my reference benchmark. It hits all four parts of the formula, includes a clear scene, and specifies tempo. In testing, it produces a usable track on the first generation about 85% of the time.

Once you have the result, change one word at a time — swap "joyful" for "wistful," swap "summer" for "rainy autumn," swap "female" for "male" — and listen to how each single change moves the song. That side-by-side is the fastest way to internalize what each part of the prompt is actually doing.

Frequently asked questions

What makes an AI song prompt work?

A working prompt has four parts in one sentence: a specific genre, vocal type, two concrete instruments, and one mood or scene. Adding a tempo helps for energy-driven genres. Vague descriptors like "beautiful" or "epic" do not translate to musical decisions.

How long should an AI song prompt be?

One to two sentences. Long prompts confuse the model. Stick to the four-part formula and you will rarely need more than 25 words.

Should I include a BPM in my AI music prompt?

Yes for electronic genres where tempo defines the subgenre. For acoustic and pop, BPM is optional but helpful when you have a specific energy in mind.

Why does my AI song prompt produce generic results?

Generic results come from generic prompts. Replace vague words with specific descriptors, add a scene the model can interpret, and use a specific subgenre instead of a broad category.

Can I write AI song prompts in languages other than English?

Modern AI music models accept most major languages, though English produces the most reliable results. For non-English vocals, specify the language explicitly: "female vocals in Spanish."

The short version: prompts are not magic spells, they are decisions. The four-part formula — genre, vocals, two instruments, scene — gives the model enough decisions to land on the song you imagined. Cut the empty words, name what you actually want, and you will get there in one or two generations instead of seven.

Ready to make your own?

Try everything you just read about. Muziko is free to download.

Download on App Store