How many samples do I need to train an ACE-Step LoRA?

A minimum of 20 samples can capture basic style characteristics, but 50-100 samples produce much more consistent results. Quality matters more than quantity — 20 pristine, well-captioned samples beat 100 noisy, poorly labeled ones.

How long does ACE-Step LoRA training take?

On an RTX 3090, training 50 samples for 100 epochs takes approximately 2-3 hours. On an RTX 4090, expect 1-1.5 hours. Training time scales roughly linearly with dataset size and epoch count.

Can I share my trained LoRA with others?

Yes. ACE-Step LoRA weights are Apache 2.0 licensed like the base model. Many users share their style LoRAs on HuggingFace and CivitAI.

Will my LoRA work after ACE-Step updates?

LoRAs are version-specific. A LoRA trained on ACE-Step 1.5 may not work correctly with future model versions without retraining or conversion.

LoRA Training

ACE-Step LoRA Training Guide

Fine-tune ACE-Step with custom LoRA adapters to generate music in your unique style.

Last updated: February 2026

What is LoRA Fine-tuning?

LoRA (Low-Rank Adaptation) is a technique that lets you fine-tune large models efficiently by training only a small set of adapter weights. For ACE-Step, this means you can teach the model a specific musical style — your band's sound, a genre niche, or an artist's aesthetic — using just 20-50 reference tracks.

Dataset Preparation

Quality data is the most important factor in LoRA training. Each audio file should be 30-120 seconds, single-instrument or clearly mixed, and represent the target style consistently. Avoid compressed MP3s for training — use WAV or FLAC at 44.1kHz or higher.

Minimum 20 samples for basic style capture
50-100 samples for robust style generalization
Use stem separation tools to isolate clean vocals or instruments
Tag each file with accurate text captions (genre, tempo, key, mood)
Normalize audio loudness to -14 LUFS for consistency

# Using Demucs for stem separation
pip install demucs
python -m demucs --two-stems=vocals audio/mixed_track.wav

Stem separation with Demucs for cleaner vocal isolation

# Normalize to -14 LUFS using ffmpeg
ffmpeg -i input.wav -filter:a loudnorm=I=-14:TP=-1.5:LRA=11 output.wav

Loudness normalization to -14 LUFS

Training Parameters

These recommended parameters work for most music style LoRAs. Adjust based on your dataset size and target style specificity.

Parameter	Recommended	Range	Note
LoRA Rank	16	4–64	Higher = more capacity, slower training
LoRA Alpha	32	8–128	Usually 2× rank value
Learning Rate	1e-4	5e-5 – 5e-4	Lower for small datasets
Batch Size	4	1–16	Reduce if OOM errors occur
Epochs	50–150	20–500	Monitor for overfitting
Warmup Steps	50	0–200	Stabilizes early training

Training Steps

1
Prepare Training Data
Collect 20-100 audio samples representing your target style. Use stem separation for cleaner training signal. Export as WAV/FLAC at 44.1kHz.
```
# Recommended directory structure:
dataset/
  audio/
    track_001.wav
    track_002.wav
    ...
  metadata.json
```

Generate Text Captions

Write accurate descriptive captions for each audio file: genre, instruments, tempo, mood, key. Caption quality directly impacts LoRA effectiveness.

[
  {
    "file": "audio/track_001.wav",
    "caption": "upbeat indie folk, acoustic guitar, female vocals, 120 BPM, C major, energetic"
  },
  {
    "file": "audio/track_002.wav",
    "caption": "melancholic jazz, piano and double bass, slow tempo, 70 BPM, F minor, introspective"
  }
]

Configure Training Script

Copy configs/lora_training_template.yaml to configs/my_lora.yaml. Set data_dir to your dataset path and adjust num_epochs based on dataset size.

# configs/my_lora.yaml
model:
  base_model: "ace-step-1.5"
  lora_rank: 16
  lora_alpha: 32
  target_modules: ["q_proj", "v_proj", "k_proj", "out_proj"]

training:
  num_epochs: 100
  batch_size: 4
  learning_rate: 1.0e-4
  warmup_steps: 50
  save_every: 25

data:
  data_dir: "./dataset"
  sample_rate: 44100
  max_duration: 120

Run Training

Execute: python train_lora.py --config configs/my_lora.yaml. Monitor loss curves — training loss should decrease steadily without spiking.

python train_lora.py \
  --config configs/my_lora.yaml \
  --data_dir ./dataset \
  --output_dir ./lora_output \
  --num_epochs 100 \
  --batch_size 4 \
  --learning_rate 1e-4 \
  --lora_rank 16

Test and Iterate

Load your LoRA checkpoint and test with varied prompts. If outputs don't match the target style, increase training data or epochs.

# Load and use your trained LoRA
python generate.py \
  --prompt "upbeat indie folk with acoustic guitar" \
  --lora_path ./lora_output/checkpoint-100 \
  --lora_weight 0.8

FAQ

Zero-Shot Style Control with FM9

LoRA training requires 20-50 reference tracks, 8GB+ GPU, and hours of compute. FM9 lets you control music style with descriptive prompts — no training, no data collection, no waiting. Try it free with 50 bonus credits.

Start Creating Free

ACE-Step LoRA Training Guide

What is LoRA Fine-tuning?

Dataset Preparation

Training Parameters

Training Steps

Prepare Training Data

Generate Text Captions

Configure Training Script

Run Training

Test and Iterate

FAQ

Zero-Shot Style Control with FM9

Related Articles

ACE-Step Overview

Installation Guide

ACE-Step vs Suno