AI Breakthrough! ZAYA1: Future Unlocked 🚀🧠

Tech & Science

Zyphra, in partnership with AMD and IBM, undertook a year-long initiative to evaluate the capabilities of AMD’s GPUs and platform for large-scale AI model training, resulting in the development of ZAYA1. This model, described as the first major Mixture-of-Experts foundation model built entirely on AMD GPUs and networking, demonstrates that the market doesn’t need to rely solely on NVIDIA to scale AI operations. ZAYA1 was trained using AMD’s Instinct MI300X chips, Pensando networking, and ROCm software, all running across IBM Cloud’s infrastructure. Notably, the system’s architecture closely resembles a conventional enterprise cluster, utilizing familiar components without NVIDIA products. Zyphra reports that ZAYA1 achieves performance levels comparable to, and sometimes exceeding, established open models in areas such as reasoning, mathematics, and code. This presents a valuable alternative for businesses grappling with supply constraints and rising GPU costs, offering a viable option that doesn't compromise on capability. Each node within the ZAYA1 system incorporates eight MI300X GPUs connected via InfinityFabric and is paired with its own Pollara network card. A separate network handles dataset reads and checkpointing. This straightforward design – prioritizing simplicity in wiring and network layout – minimizes switch costs and ensures consistent iteration times. ZAYA1, a foundation model, activates 760 million parameters.

The AI model was trained using a total of 8.3 billion parameters and leveraged 12 trillion tokens across three distinct stages. Its architecture incorporates compressed attention, a refined routing system designed to direct tokens to relevant experts, and lighter-touch residual scaling to maintain stability within deeper layers. The model employs a combination of Muon and AdamW optimizers. To enhance Muon’s efficiency on AMD hardware, Zyphra implemented fused kernels and minimized unnecessary memory traffic, preventing the optimizer from becoming a bottleneck during iterations. Batch sizes were gradually increased, contingent on the availability of rapid token delivery through optimized storage pipelines. Consequently, this resulted in an AI model trained on AMD hardware that rivals larger models such as Qwen3-4B, Gemma3-12B, Llama-3-8B, and OLMoE. A key advantage of the Mixture-of-Experts (MoE) structure is that only a portion of the model is active at any given time, effectively managing inference memory and reducing serving costs. The MI300X’s ample memory headroom provides engineers with the space needed for iteration, while ZAYA1’s compressed attention significantly reduces prefill time during evaluation. Recognizing the effort required to transition a mature NVIDIA-based workflow to ROCm, Zyphra meticulously measured AMD hardware performance and adjusted model dimensions, GEMM patterns, and microbatch sizes to align with MI300X’s preferred compute ranges. Furthermore, InfinityFabric operates most effectively when all eight GPUs within a node participate in collectives, and Pollara typically achieves peak throughput with larger messages – details Zyphra carefully considered.

Long-context training, ranging from 4k to 32k tokens, utilized ring attention for sharded sequences and tree attention during decoding to mitigate potential bottlenecks. Simultaneously, practical storage considerations were addressed: smaller models heavily impact IOPS, while larger models require sustained bandwidth. Zyphra implemented several key strategies, including bundling dataset shards to minimize scattered reads and expanding per-node page caches to expedite checkpoint recovery – a critical process given the likelihood of rewinds during extended training runs. To maintain cluster stability, the team also increased RCCL timeouts to prevent short network interruptions from halting entire jobs. Checkpointing was distributed across all GPUs rather than flowing through a single chokepoint, resulting in more than ten-fold faster saves compared to conventional approaches, thereby directly improving uptime and reducing operator workload. Zyphra’s Aegis service continuously monitors logs and system metrics, automatically identifying failures such as NIC glitches or ECC blips and enacting straightforward corrective actions. Taking place in Amsterdam, California, and London, the comprehensive event is part of TechEx.

The event will be co-located with other leading technology events, including the Cyber Security Expo, and is scheduled for November 24, 2025.