Qwen3.5 27B and 35B with 2x AMD 7900 XTX vLLM bench serve results
Sentiment Mix
Geography
Expert Signals
bettertoknow
author • 1 mention
r/LocalLLaMA
source • 1 mention
AI-Generated Claims
Generated from linked receipts; click sources for full context.
Qwen3.5 27B and 35B with 2x AMD 7900 XTX vLLM bench serve results.
Supported by 1 story
I've enjoyed the recent reports of success with Qwen3.5 using vLLM with multiple AMD GPU, especially for such a dwindling market share these days!
Supported by 1 story
Here are some 'bench serve' results from 2x 7900 XTX and the smaller Qwen 3.5 models, cyankiwi/Qwen3.5-27B-AWQ-BF16-INT4 and cyankiwi/Qwen3.5-35B-A3B-AWQ-4bit.
Supported by 1 story
This was done with a fairly recent rocm/vllm-dev:nightly container: 0.17.2rc1.dev43+ge6c479770 kernel version: 6.19.8-cachyos-lto (maybe relevant) kernel cmdline: ttm.pages_limit=30720000 iommu=pt amdgpu.ppfeaturemask=0xfffd7fff **The key** to getting this working at speed was using the poorly/undocumented/legacy env var HSA_ENABLE_IPC_MODE_LEGACY=0 Otherwise, it was necessary to disable NCCL P2P via NCCL_P2P_DISABLE=1 just to have vLLM serve the model.
Supported by 1 story
But whats the point of multi-GPU without some P2P!
Supported by 1 story
Related Events
24GB VRAM users, have you tried Qwen3.5-9B-UD-Q8_K_XL?
Uncategorized • 3/21/2026
Meta and Oracle choose NVIDIA Spectrum-X for AI data centres - AI News
Hardware • 3/21/2026
RTX 5060 Ti 16GB vs Context Window Size
Uncategorized • 3/21/2026
Nvidia says its GPUs are a 'generation ahead' of Google's AI chips - CNBC
Hardware • 3/21/2026
Qwen 3.5 397B is the best local coder I have used until now
Uncategorized • 3/21/2026
Causality Chain
Preceded By