LoRA Training

ACE-Step LoRA Training Guide

Fine-tune ACE-Step with custom LoRA adapters to generate music in your unique style.

Last updated: February 2026

What is LoRA Fine-tuning?

LoRA (Low-Rank Adaptation) is a technique that lets you fine-tune large models efficiently by training only a small set of adapter weights. For ACE-Step, this means you can teach the model a specific musical style β€” your band's sound, a genre niche, or an artist's aesthetic β€” using just 20-50 reference tracks.

Dataset Preparation

Quality data is the most important factor in LoRA training. Each audio file should be 30-120 seconds, single-instrument or clearly mixed, and represent the target style consistently. Avoid compressed MP3s for training β€” use WAV or FLAC at 44.1kHz or higher.

  • Minimum 20 samples for basic style capture
  • 50-100 samples for robust style generalization
  • Use stem separation tools to isolate clean vocals or instruments
  • Tag each file with accurate text captions (genre, tempo, key, mood)
  • Normalize audio loudness to -14 LUFS for consistency
# Using Demucs for stem separation
pip install demucs
python -m demucs --two-stems=vocals audio/mixed_track.wav

Stem separation with Demucs for cleaner vocal isolation

# Normalize to -14 LUFS using ffmpeg
ffmpeg -i input.wav -filter:a loudnorm=I=-14:TP=-1.5:LRA=11 output.wav

Loudness normalization to -14 LUFS

Training Parameters

These recommended parameters work for most music style LoRAs. Adjust based on your dataset size and target style specificity.

ParameterRecommendedRangeNote
LoRA Rank164–64Higher = more capacity, slower training
LoRA Alpha328–128Usually 2Γ— rank value
Learning Rate1e-45e-5 – 5e-4Lower for small datasets
Batch Size41–16Reduce if OOM errors occur
Epochs50–15020–500Monitor for overfitting
Warmup Steps500–200Stabilizes early training

Training Steps

  1. 1

    Prepare Training Data

    Collect 20-100 audio samples representing your target style. Use stem separation for cleaner training signal. Export as WAV/FLAC at 44.1kHz.

    # Recommended directory structure:
    dataset/
      audio/
        track_001.wav
        track_002.wav
        ...
      metadata.json
  2. 2

    Generate Text Captions

    Write accurate descriptive captions for each audio file: genre, instruments, tempo, mood, key. Caption quality directly impacts LoRA effectiveness.

    [
      {
        "file": "audio/track_001.wav",
        "caption": "upbeat indie folk, acoustic guitar, female vocals, 120 BPM, C major, energetic"
      },
      {
        "file": "audio/track_002.wav",
        "caption": "melancholic jazz, piano and double bass, slow tempo, 70 BPM, F minor, introspective"
      }
    ]
  3. 3

    Configure Training Script

    Copy configs/lora_training_template.yaml to configs/my_lora.yaml. Set data_dir to your dataset path and adjust num_epochs based on dataset size.

    # configs/my_lora.yaml
    model:
      base_model: "ace-step-1.5"
      lora_rank: 16
      lora_alpha: 32
      target_modules: ["q_proj", "v_proj", "k_proj", "out_proj"]
    
    training:
      num_epochs: 100
      batch_size: 4
      learning_rate: 1.0e-4
      warmup_steps: 50
      save_every: 25
    
    data:
      data_dir: "./dataset"
      sample_rate: 44100
      max_duration: 120
  4. 4

    Run Training

    Execute: python train_lora.py --config configs/my_lora.yaml. Monitor loss curves β€” training loss should decrease steadily without spiking.

    python train_lora.py \
      --config configs/my_lora.yaml \
      --data_dir ./dataset \
      --output_dir ./lora_output \
      --num_epochs 100 \
      --batch_size 4 \
      --learning_rate 1e-4 \
      --lora_rank 16
  5. 5

    Test and Iterate

    Load your LoRA checkpoint and test with varied prompts. If outputs don't match the target style, increase training data or epochs.

    # Load and use your trained LoRA
    python generate.py \
      --prompt "upbeat indie folk with acoustic guitar" \
      --lora_path ./lora_output/checkpoint-100 \
      --lora_weight 0.8

FAQ

Zero-Shot Style Control with FM9

LoRA training requires 20-50 reference tracks, 8GB+ GPU, and hours of compute. FM9 lets you control music style with descriptive prompts β€” no training, no data collection, no waiting. Try it free with 50 bonus credits.

Start Creating Free