Qwen3.5 27B and 35B with 2x AMD 7900 XTX vLLM bench serve results

1 sources1 storiesFirst seen 3/20/2026Score20Mixed Progress

Single Source

Bigness

Coverage

Recency

Engagement

Velocity

Confidence

Clipability

Polarization

Claims

Contradictions

Breakthrough

Sentiment Mix

Positive0%

Neutral100%

Negative0%

Geography

North America

Expert Signals

bettertoknow

author • 1 mention

r/LocalLLaMA

source • 1 mention

AI-Generated Claims

Generated from linked receipts; click sources for full context.

Qwen3.5 27B and 35B with 2x AMD 7900 XTX vLLM bench serve results.

Supported by 1 story

I've enjoyed the recent reports of success with Qwen3.5 using vLLM with multiple AMD GPU, especially for such a dwindling market share these days!

Supported by 1 story

Here are some 'bench serve' results from 2x 7900 XTX and the smaller Qwen 3.5 models, cyankiwi/Qwen3.5-27B-AWQ-BF16-INT4 and cyankiwi/Qwen3.5-35B-A3B-AWQ-4bit.

Supported by 1 story

This was done with a fairly recent rocm/vllm-dev:nightly container: 0.17.2rc1.dev43+ge6c479770 kernel version: 6.19.8-cachyos-lto (maybe relevant) kernel cmdline: ttm.pages_limit=30720000 iommu=pt amdgpu.ppfeaturemask=0xfffd7fff **The key** to getting this working at speed was using the poorly/undocumented/legacy env var HSA_ENABLE_IPC_MODE_LEGACY=0 Otherwise, it was necessary to disable NCCL P2P via NCCL_P2P_DISABLE=1 just to have vLLM serve the model.

Supported by 1 story

But whats the point of multi-GPU without some P2P!

Supported by 1 story