Fine-tune ACE-Step with custom LoRA adapters to generate music in your unique style.
Last updated: February 2026
LoRA (Low-Rank Adaptation) is a technique that lets you fine-tune large models efficiently by training only a small set of adapter weights. For ACE-Step, this means you can teach the model a specific musical style β your band's sound, a genre niche, or an artist's aesthetic β using just 20-50 reference tracks.
Quality data is the most important factor in LoRA training. Each audio file should be 30-120 seconds, single-instrument or clearly mixed, and represent the target style consistently. Avoid compressed MP3s for training β use WAV or FLAC at 44.1kHz or higher.
# Using Demucs for stem separation
pip install demucs
python -m demucs --two-stems=vocals audio/mixed_track.wavStem separation with Demucs for cleaner vocal isolation
# Normalize to -14 LUFS using ffmpeg
ffmpeg -i input.wav -filter:a loudnorm=I=-14:TP=-1.5:LRA=11 output.wavLoudness normalization to -14 LUFS
These recommended parameters work for most music style LoRAs. Adjust based on your dataset size and target style specificity.
| Parameter | Recommended | Range | Note |
|---|---|---|---|
| LoRA Rank | 16 | 4β64 | Higher = more capacity, slower training |
| LoRA Alpha | 32 | 8β128 | Usually 2Γ rank value |
| Learning Rate | 1e-4 | 5e-5 β 5e-4 | Lower for small datasets |
| Batch Size | 4 | 1β16 | Reduce if OOM errors occur |
| Epochs | 50β150 | 20β500 | Monitor for overfitting |
| Warmup Steps | 50 | 0β200 | Stabilizes early training |
Collect 20-100 audio samples representing your target style. Use stem separation for cleaner training signal. Export as WAV/FLAC at 44.1kHz.
# Recommended directory structure:
dataset/
audio/
track_001.wav
track_002.wav
...
metadata.jsonWrite accurate descriptive captions for each audio file: genre, instruments, tempo, mood, key. Caption quality directly impacts LoRA effectiveness.
[
{
"file": "audio/track_001.wav",
"caption": "upbeat indie folk, acoustic guitar, female vocals, 120 BPM, C major, energetic"
},
{
"file": "audio/track_002.wav",
"caption": "melancholic jazz, piano and double bass, slow tempo, 70 BPM, F minor, introspective"
}
]Copy configs/lora_training_template.yaml to configs/my_lora.yaml. Set data_dir to your dataset path and adjust num_epochs based on dataset size.
# configs/my_lora.yaml
model:
base_model: "ace-step-1.5"
lora_rank: 16
lora_alpha: 32
target_modules: ["q_proj", "v_proj", "k_proj", "out_proj"]
training:
num_epochs: 100
batch_size: 4
learning_rate: 1.0e-4
warmup_steps: 50
save_every: 25
data:
data_dir: "./dataset"
sample_rate: 44100
max_duration: 120Execute: python train_lora.py --config configs/my_lora.yaml. Monitor loss curves β training loss should decrease steadily without spiking.
python train_lora.py \
--config configs/my_lora.yaml \
--data_dir ./dataset \
--output_dir ./lora_output \
--num_epochs 100 \
--batch_size 4 \
--learning_rate 1e-4 \
--lora_rank 16Load your LoRA checkpoint and test with varied prompts. If outputs don't match the target style, increase training data or epochs.
# Load and use your trained LoRA
python generate.py \
--prompt "upbeat indie folk with acoustic guitar" \
--lora_path ./lora_output/checkpoint-100 \
--lora_weight 0.8LoRA training requires 20-50 reference tracks, 8GB+ GPU, and hours of compute. FM9 lets you control music style with descriptive prompts β no training, no data collection, no waiting. Try it free with 50 bonus credits.
Start Creating Free