How many samples do I need to train an ACE-Step LoRA?

A minimum of 20 samples can capture basic style characteristics, but 50-100 samples produce much more consistent results. Quality matters more than quantity — 20 pristine, well-captioned samples beat 100 noisy, poorly labeled ones.

How long does ACE-Step LoRA training take?

On an RTX 3090, training 50 samples for 100 epochs takes approximately 2-3 hours. On an RTX 4090, expect 1-1.5 hours. Training time scales roughly linearly with dataset size and epoch count.

Can I share my trained LoRA with others?

Yes. ACE-Step LoRA weights are Apache 2.0 licensed like the base model. Many users share their style LoRAs on HuggingFace and CivitAI.

Will my LoRA work after ACE-Step updates?

LoRAs are version-specific. A LoRA trained on ACE-Step 1.5 may not work correctly with future model versions without retraining or conversion.

LoRA Training

ACE-Step LoRA 训练指南

使用自定义 LoRA 适配器微调 ACE-Step，以你独特的风格生成音乐。

最后更新：2026年2月

什么是 LoRA 微调？

LoRA（低秩适应）是一种通过仅训练少量适配器权重来高效微调大型模型的技术。对于 ACE-Step，这意味着你可以使用 20-50 首参考曲目教会模型特定的音乐风格——你乐队的声音、某种小众流派或某位艺术家的美学。

数据集准备

高质量数据是 LoRA 训练中最重要的因素。每个音频文件应为 30-120 秒、单乐器或清晰混音，并能一致地代表目标风格。训练时避免使用压缩的 MP3——使用 44.1kHz 或更高采样率的 WAV 或 FLAC。

最少 20 个样本即可捕获基本风格特征
50-100 个样本可实现更鲁棒的风格泛化
使用音频分离工具提取干净的人声或乐器
为每个文件添加准确的文本描述（流派、节奏、调性、情绪）
将音频响度归一化至 -14 LUFS 以保持一致性

# Using Demucs for stem separation
pip install demucs
python -m demucs --two-stems=vocals audio/mixed_track.wav

Stem separation with Demucs for cleaner vocal isolation

# Normalize to -14 LUFS using ffmpeg
ffmpeg -i input.wav -filter:a loudnorm=I=-14:TP=-1.5:LRA=11 output.wav

Loudness normalization to -14 LUFS

训练参数

这些推荐参数适用于大多数音乐风格 LoRA。根据数据集大小和目标风格的特定性进行调整。

Parameter	Recommended	Range	Note
LoRA Rank	16	4–64	Higher = more capacity, slower training
LoRA Alpha	32	8–128	Usually 2× rank value
Learning Rate	1e-4	5e-5 – 5e-4	Lower for small datasets
Batch Size	4	1–16	Reduce if OOM errors occur
Epochs	50–150	20–500	Monitor for overfitting
Warmup Steps	50	0–200	Stabilizes early training

训练步骤

1
准备训练数据
收集 20-100 个代表目标风格的音频样本。使用音频分离工具获得更干净的训练信号。以 44.1kHz 的 WAV/FLAC 格式导出。
```
# Recommended directory structure:
dataset/
  audio/
    track_001.wav
    track_002.wav
    ...
  metadata.json
```

生成文本描述

为每个音频文件写准确的描述性说明：流派、乐器、节奏、情绪、调性。描述质量直接影响 LoRA 效果。

[
  {
    "file": "audio/track_001.wav",
    "caption": "upbeat indie folk, acoustic guitar, female vocals, 120 BPM, C major, energetic"
  },
  {
    "file": "audio/track_002.wav",
    "caption": "melancholic jazz, piano and double bass, slow tempo, 70 BPM, F minor, introspective"
  }
]

配置训练脚本

将 configs/lora_training_template.yaml 复制到 configs/my_lora.yaml。将 data_dir 设置为你的数据集路径，并根据数据集大小调整 num_epochs。

# configs/my_lora.yaml
model:
  base_model: "ace-step-1.5"
  lora_rank: 16
  lora_alpha: 32
  target_modules: ["q_proj", "v_proj", "k_proj", "out_proj"]

training:
  num_epochs: 100
  batch_size: 4
  learning_rate: 1.0e-4
  warmup_steps: 50
  save_every: 25

data:
  data_dir: "./dataset"
  sample_rate: 44100
  max_duration: 120

运行训练

执行：python train_lora.py --config configs/my_lora.yaml。监控损失曲线——训练损失应稳步下降而不出现峰值。

python train_lora.py \
  --config configs/my_lora.yaml \
  --data_dir ./dataset \
  --output_dir ./lora_output \
  --num_epochs 100 \
  --batch_size 4 \
  --learning_rate 1e-4 \
  --lora_rank 16

测试与迭代

加载 LoRA 检查点并用各种提示词测试。如果输出与目标风格不匹配，增加训练数据或训练轮数。

# Load and use your trained LoRA
python generate.py \
  --prompt "upbeat indie folk with acoustic guitar" \
  --lora_path ./lora_output/checkpoint-100 \
  --lora_weight 0.8

FAQ

使用 FM9 实现零样本风格控制

LoRA 训练需要 20-50 首参考曲目、8GB+ GPU 和数小时的算力。FM9 让你用描述性提示词控制音乐风格——无需训练、无需数据收集、无需等待。免费注册获得 50 积分。

免费开始创作

ACE-Step LoRA 训练指南

什么是 LoRA 微调？

数据集准备

训练参数

训练步骤

准备训练数据

生成文本描述

配置训练脚本

运行训练

测试与迭代

FAQ

使用 FM9 实现零样本风格控制

相关文章

ACE-Step Overview

Installation Guide

ACE-Step vs Suno