AMD and Alibaba Partner to Bring Qwen 3.5 LLM to Instinct GPUs

AMD is accelerating AI innovation by offering full compatibility with Alibaba’s Qwen 3.5 large language model (LLM) on its flagship Instinct MI300X, MI325X, and MI355X GPUs, effective immediately. This marks the first time developers can deploy Qwen 3.5’s advanced capabilities—including massive 256K context windows and multimodal processing—without delays, thanks to AMD’s optimized ROCm software stack and integration with inference frameworks like SGLang and vLLM.

The move addresses a critical challenge in AI development: scaling models without sacrificing performance. Traditional transformer architectures struggle with long sequences due to their quadratic computational demands, but Qwen 3.5 introduces a Hybrid Attention system that combines full attention layers with Gated Delta Networks for linear scaling. This allows the model to handle extensive contexts—like full documents or complex workflows—while maintaining efficiency.

For enterprises, the implications are significant. Qwen 3.5’s Ultra-Sparse Mixture of Experts (MoE) design activates only a fraction of its parameters during inference, reducing compute costs while outperforming larger dense models. Combined with AMD’s high-bandwidth HBM memory, this means developers can run full-scale models on a single GPU or node, cutting hardware requirements for production environments.

AMD and Alibaba Partner to Bring Qwen 3.5 LLM to Instinct GPUs—What It Means for AI Development

Key Technical Advancements

The collaboration also unlocks native multimodal support, enabling Qwen 3.5 to process visual data alongside text. Features like 3D convolutions and the DeepStack Vision Transformer allow the model to analyze temporal dynamics in video or identify objects in industrial settings—useful for applications ranging from customer support to autonomous systems. AMD’s optimized kernels for linear attention, MoE routing, and vision operations ensure seamless integration with existing workflows.

Developers can begin deploying Qwen 3.5 on AMD Instinct GPUs without modification, thanks to pre-configured support in vLLM and SGLang. The Triton-based kernels for Gated Delta Networks and hipBLASLt GEMM optimizations for MoE pathways are fully compatible with ROCm, while multimodal operations leverage standard MIOpen and PyTorch tools.

Why This Matters for AI Builders

For AI researchers and enterprises, this partnership eliminates the need for custom optimizations when adopting Qwen 3.5. The combination of linear-scaling attention, sparse inference, and native multimodality reduces the barriers to deploying cutting-edge models at scale. Whether building enterprise AI agents, high-context chatbots, or visual AI systems, teams can now leverage AMD’s hardware and software stack to achieve performance that would otherwise require costly workarounds.

AMD’s move aligns with broader trends in open-weight AI, where flexibility and efficiency are paramount. By providing Day 0 support, the company ensures developers can focus on innovation rather than infrastructure hurdles—ushering in a new era of accessible, high-performance AI deployment.