Is ACE-Step free to use?

Yes. ACE-Step is released under the Apache 2.0 license, meaning it's free for both personal and commercial use. You only pay for the compute costs of running it locally.

What GPU do I need to run ACE-Step?

ACE-Step 1.5 requires an NVIDIA GPU with at least 8GB VRAM for FP16 inference. A 24GB GPU (RTX 3090/4090) is recommended for comfortable use with longer generations.

How does ACE-Step 1.5 compare to the original?

ACE-Step 1.5 brought major improvements in vocal clarity, support for longer audio clips (up to 4 minutes), better rhythm consistency, and improved multi-track coherence compared to the initial release.

Can I use ACE-Step without a GPU?

Technically yes, using CPU-only mode, but generation times become impractical (30+ minutes per clip). FM9 offers cloud-based AI music generation as a more practical alternative.

Open Source AI Music

ACE-Step 1.5: The Open-Source AI Music Model

ACE-Step is ByteDance's open-source foundation model for music generation — combining diffusion transformers with a conditional flow matching approach. Here's everything you need to know.

Last updated: February 2026

What Is ACE-Step?

ACE-Step is an open-source text-to-music model released by ByteDance in 2025. Version 1.5 introduced significant improvements in vocal clarity, rhythmic consistency, and multi-instrument coherence. The model uses a diffusion transformer architecture conditioned on text descriptions, supporting generation up to 4 minutes.

Apache 2.0 licensed — free for commercial use
Runs locally on NVIDIA GPUs with 8GB+ VRAM
Supports lyrics-to-song generation
LoRA fine-tuning for custom styles

How ACE-Step Works

ACE-Step uses a Latent Diffusion Model (LDM) operating in the STFT (Short-Time Fourier Transform) domain. Unlike waveform-based models, this approach enables high-quality audio synthesis at reduced computational cost. The architecture combines a music VAE encoder-decoder with a conditional flow matching diffusion transformer.

✓ Strengths

Free and open source (Apache 2.0)
Strong lyrics integration and vocal quality
Supports long-form generation (up to 4 min)
Active community and LoRA ecosystem
No usage restrictions or watermarking

⚠ Limitations

—Requires NVIDIA GPU with 8GB+ VRAM
—Complex local setup and dependency management
—Slower generation than cloud APIs (2-5 min on RTX 3090)
—No real-time collaboration features
—Limited support for exotic instruments

FM9 vs Local ACE-Step

FM9 gives you cloud-powered music generation — no GPU required, no setup, no waiting for dependencies to install. While ACE-Step is remarkable for researchers and power users who want full control, FM9 delivers instant results for creators who want to focus on the music, not the infrastructure.

FAQ

Try FM9 Free — No Setup Required

Generate professional AI music in your browser. 50 free credits on signup.

Start Creating Free

ACE-Step 1.5: The Open-Source AI Music Model

What Is ACE-Step?

How ACE-Step Works

✓ Strengths

⚠ Limitations

FM9 vs Local ACE-Step

Explore in Depth

ACE-Step vs Suno

Installation Guide

LoRA Training

FAQ

Try FM9 Free — No Setup Required