By Michael Gamble, Partner & Ecosystem Lead, Arm
This project brings together the best of several worlds:
The Stable Audio Open model from Stability AI, sourced from Hugging Face
Execution powered by PyTorch and TorchAudio
A fast, efficient pipeline that runs natively on Arm-based CPUs
A seamless creative handoff to Ableton Live
When I’m deep in a music project using Ableton Live, I don’t want to interrupt my workflow to dig through libraries or browse sound packs. I wanted a tool that could meet me where I am—right in the flow.
Now, I can simply describe the sound I’m imagining (“analog bassline,” “cinematic riser,” “lofi snare”), and within seconds, the generated .wav file appears in my Ableton browser. From there, I can tweak it, loop it, or turn it into an instrument.
Every sound is unique. No one else will generate exactly what I do. That sense of personal ownership fuels my creativity.
This sound generator runs entirely on-device using Arm-based CPU technology—no GPU, no cloud inference, no latency. Thanks to Arm's efficiency and performance-per-watt, the app stays responsive even during multi-step diffusion runs.
The generation engine is built on:
The Stable Audio Open model by Stability AI, available via Hugging Face
PyTorch and TorchAudio for model inference and audio handling
Optimized multithreaded execution for smooth CPU performance
To maximize performance on Arm CPUs, I enabled full thread utilization:
# Use all available Arm CPU threads
torch.set_num_threads(os.cpu_count())To maintain low memory usage across generations:
# Clear memory periodically
if gen_count % 3 == 0:
gc.collect()
print(f"Memory cleared at generation {gen_count}")Core generation loop, tuned for speed and efficiency:
output = generate_diffusion_cond(
model,
steps=7, # Reduced step count for faster inference
cfg_scale=1,
conditioning=conditioning,
sample_size=sample_size,
sigma_min=0.3,
sigma_max=500,
sampler_type="dpmpp-3m-sde",
device=device
)Although optimized for CPU, the program can also run on Metal (Apple Silicon) or CUDA if needed:
device = "mps" # Apple Silicon
# device = "cuda" # NVIDIA
# device = "cpu" # Arm CPU (default)
model = model.to(device).to(torch.float32)The tool outputs .wav files directly to a project folder monitored by Ableton Live. Here's a sample CLI interaction:
Enter a prompt for generating audio:
Ambient texture
Enter a tempo for the audio:
100
Generated audio saved to: Ambient texture.wavI immediately see the file show up in my browser within Live, ready to be arranged, modulated, and transformed.
This project is a personal prototype—but it’s also a window into the future of content creation. With efficient, on-device AI inference on Arm CPUs, artists and developers can:
Stay in creative flow without waiting on cloud resources
Ensure data privacy and full ownership of outputs
Extend AI tools into edge devices, DAWs, and new creative interfaces
This is what happens when open-source innovation meets efficient compute: real-time generative power, accessible to every creator.
Explore the ecosystem that made this possible: