ACE-Step is ByteDance's open-source foundation model for music generation β combining diffusion transformers with a conditional flow matching approach. Here's everything you need to know.
Last updated: February 2026
ACE-Step is an open-source text-to-music model released by ByteDance in 2025. Version 1.5 introduced significant improvements in vocal clarity, rhythmic consistency, and multi-instrument coherence. The model uses a diffusion transformer architecture conditioned on text descriptions, supporting generation up to 4 minutes.
ACE-Step uses a Latent Diffusion Model (LDM) operating in the STFT (Short-Time Fourier Transform) domain. Unlike waveform-based models, this approach enables high-quality audio synthesis at reduced computational cost. The architecture combines a music VAE encoder-decoder with a conditional flow matching diffusion transformer.
FM9 gives you cloud-powered music generation β no GPU required, no setup, no waiting for dependencies to install. While ACE-Step is remarkable for researchers and power users who want full control, FM9 delivers instant results for creators who want to focus on the music, not the infrastructure.
Generate professional AI music in your browser. 50 free credits on signup.
Start Creating Free