Yes, Iβm aware of the accuracy issue, and weβre actively investigating it. One potential fix is discussed in issue #1784. However, we still don't know whether this solution applies to your model or more broadly to other vlm models, since Qwen3.5 appears to work fine without it.
wenhua cheng
wenhuach
AI & ML interests
Model Compression, CV
Recent Activity
upvoted an article 4 days ago
Welcome Gemma 4: Frontier multimodal intelligence on device liked a model 5 days ago
Intel/gemma-4-12B-it-int4-AutoRoundOrganizations
replied to their post 5 days ago
replied to their post 7 days ago
sorry for the late reply. Is it fine now?
replied to their post 12 days ago
Working on it. Gemma 4 had a regression issue, and it has just been fixed.
Post
4517
π We provide **free** hardware to quantize models at the [Intel Low Bit Open LLM Leaderboard]( Intel/low_bit_open_llm_leaderboard), currently supporting
β If you find it useful, please consider starring the AutoRound project on [GitHub](https://github.com/intel/auto-round)!
Pure RTN mode powered by AutoRoundβ If you find it useful, please consider starring the AutoRound project on [GitHub](https://github.com/intel/auto-round)!
replied to their post 13 days ago
The devices are rented from cloud providers once a task comes in. We found that our target devices were currently sold out, and we are refining the logic accordingly. Thanks for the feedback.
posted an update 14 days ago
Post
4517
π We provide **free** hardware to quantize models at the [Intel Low Bit Open LLM Leaderboard]( Intel/low_bit_open_llm_leaderboard), currently supporting
β If you find it useful, please consider starring the AutoRound project on [GitHub](https://github.com/intel/auto-round)!
Pure RTN mode powered by AutoRoundβ If you find it useful, please consider starring the AutoRound project on [GitHub](https://github.com/intel/auto-round)!
Post
3015
π SignRoundV2 for LLM quantization: PTQ-level cost, QAT-level accuracy β yes, even at 2 bits.
SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs (2512.04746)
SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs (2512.04746)
posted an update 6 months ago
Post
3015
π SignRoundV2 for LLM quantization: PTQ-level cost, QAT-level accuracy β yes, even at 2 bits.
SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs (2512.04746)
SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs (2512.04746)
Post
324
π AutoRound(https://github.com/intel/auto-round) is now supported by SGLang!
After integrations with TorchAO, Transformers, and VLLM, AutoRound-quantized models are now officially compatible with SGLang β bringing faster and more flexible deployment to your LLM workflows.
π‘ Weβve also enhanced the RTN mode (--iters 0), cutting quantization costs significantly for low-resource users.
β Star our repo and stay tuned for more exciting updates!
After integrations with TorchAO, Transformers, and VLLM, AutoRound-quantized models are now officially compatible with SGLang β bringing faster and more flexible deployment to your LLM workflows.
π‘ Weβve also enhanced the RTN mode (--iters 0), cutting quantization costs significantly for low-resource users.
β Star our repo and stay tuned for more exciting updates!
posted an update 7 months ago
Post
324
π AutoRound(https://github.com/intel/auto-round) is now supported by SGLang!
After integrations with TorchAO, Transformers, and VLLM, AutoRound-quantized models are now officially compatible with SGLang β bringing faster and more flexible deployment to your LLM workflows.
π‘ Weβve also enhanced the RTN mode (--iters 0), cutting quantization costs significantly for low-resource users.
β Star our repo and stay tuned for more exciting updates!
After integrations with TorchAO, Transformers, and VLLM, AutoRound-quantized models are now officially compatible with SGLang β bringing faster and more flexible deployment to your LLM workflows.
π‘ Weβve also enhanced the RTN mode (--iters 0), cutting quantization costs significantly for low-resource users.
β Star our repo and stay tuned for more exciting updates!
Post
1774
AutoRound keeps evolving its LLM quantization algorithm! π
After enhancing W2A16 quantization, we now offer a fast algorithm to generate mixed bits/data-type schemes (~2mins for 8B models), great for MXFP4 and W2A16.
Learn more: https://github.com/intel/auto-round/blob/main/docs/step_by_step.md#autoscheme
After enhancing W2A16 quantization, we now offer a fast algorithm to generate mixed bits/data-type schemes (~2mins for 8B models), great for MXFP4 and W2A16.
Learn more: https://github.com/intel/auto-round/blob/main/docs/step_by_step.md#autoscheme
posted an update 8 months ago
Post
1774
AutoRound keeps evolving its LLM quantization algorithm! π
After enhancing W2A16 quantization, we now offer a fast algorithm to generate mixed bits/data-type schemes (~2mins for 8B models), great for MXFP4 and W2A16.
Learn more: https://github.com/intel/auto-round/blob/main/docs/step_by_step.md#autoscheme
After enhancing W2A16 quantization, we now offer a fast algorithm to generate mixed bits/data-type schemes (~2mins for 8B models), great for MXFP4 and W2A16.
Learn more: https://github.com/intel/auto-round/blob/main/docs/step_by_step.md#autoscheme
posted an update 9 months ago
Post
428
AutoRound v0.7 is out! π
This release includes enhanced algorithms for W2A16, NVFP4, and MXFP4, along with support for FP8 models as input.
π Check out the full details here: https://github.com/intel/auto-round/releases/tag/v0.7.0
This release includes enhanced algorithms for W2A16, NVFP4, and MXFP4, along with support for FP8 models as input.
π Check out the full details here: https://github.com/intel/auto-round/releases/tag/v0.7.0
Post
1952
π AutoRound(https://github.com/intel/auto-round) Now Supports GGUF Export & Custom Bit Settings!
We're excited to announce that AutoRound now supports:
β GGUF format export β for seamless compatibility with popular inference engines.
β Custom bit settings β tailor quantization to your needs for optimal performance.
Check out these newly released models:
πΉIntel/Qwen3-235B-A22B-Instruct-2507-gguf-q4km-AutoRound
πΉIntel/Qwen3-235B-A22B-Instruct-2507-gguf-q2ks-mixed-AutoRound
πΉIntel/Kimi-K2-Instruct-gguf-q2ks-mixed-AutoRound
Stay tuned! An even more advanced algorithm for some configurations is coming soon.
We're excited to announce that AutoRound now supports:
β GGUF format export β for seamless compatibility with popular inference engines.
β Custom bit settings β tailor quantization to your needs for optimal performance.
Check out these newly released models:
πΉIntel/Qwen3-235B-A22B-Instruct-2507-gguf-q4km-AutoRound
πΉIntel/Qwen3-235B-A22B-Instruct-2507-gguf-q2ks-mixed-AutoRound
πΉIntel/Kimi-K2-Instruct-gguf-q2ks-mixed-AutoRound
Stay tuned! An even more advanced algorithm for some configurations is coming soon.
posted an update 11 months ago
Post
1952
π AutoRound(https://github.com/intel/auto-round) Now Supports GGUF Export & Custom Bit Settings!
We're excited to announce that AutoRound now supports:
β GGUF format export β for seamless compatibility with popular inference engines.
β Custom bit settings β tailor quantization to your needs for optimal performance.
Check out these newly released models:
πΉIntel/Qwen3-235B-A22B-Instruct-2507-gguf-q4km-AutoRound
πΉIntel/Qwen3-235B-A22B-Instruct-2507-gguf-q2ks-mixed-AutoRound
πΉIntel/Kimi-K2-Instruct-gguf-q2ks-mixed-AutoRound
Stay tuned! An even more advanced algorithm for some configurations is coming soon.
We're excited to announce that AutoRound now supports:
β GGUF format export β for seamless compatibility with popular inference engines.
β Custom bit settings β tailor quantization to your needs for optimal performance.
Check out these newly released models:
πΉIntel/Qwen3-235B-A22B-Instruct-2507-gguf-q4km-AutoRound
πΉIntel/Qwen3-235B-A22B-Instruct-2507-gguf-q2ks-mixed-AutoRound
πΉIntel/Kimi-K2-Instruct-gguf-q2ks-mixed-AutoRound
Stay tuned! An even more advanced algorithm for some configurations is coming soon.
Post
1916
AutoRound(https://github.com/intel/auto-round) has been integrated into vLLM , allowing you to run AutoRound-formatted models directly in the upcoming release.
Beside, we strongly recommend using AutoRound to generate AWQ INT4 models, as AutoAWQ is no longer maintained and manually configuring new models is not trivial due to the need for custom layer mappings.
Beside, we strongly recommend using AutoRound to generate AWQ INT4 models, as AutoAWQ is no longer maintained and manually configuring new models is not trivial due to the need for custom layer mappings.
posted an update about 1 year ago
Post
1916
AutoRound(https://github.com/intel/auto-round) has been integrated into vLLM , allowing you to run AutoRound-formatted models directly in the upcoming release.
Beside, we strongly recommend using AutoRound to generate AWQ INT4 models, as AutoAWQ is no longer maintained and manually configuring new models is not trivial due to the need for custom layer mappings.
Beside, we strongly recommend using AutoRound to generate AWQ INT4 models, as AutoAWQ is no longer maintained and manually configuring new models is not trivial due to the need for custom layer mappings.
Post
1944
AutoRound(https://github.com/intel/auto-round) has been integrated into Transformers, allowing you to run AutoRound-formatted models directly in the upcoming release. Additionally, we are actively working on supporting the GGUF double-quant format, e.g. q4_k_s, stay tuned!
https://huggingface.co/blog/autoround
https://huggingface.co/blog/autoround
posted an update about 1 year ago
Post
1944
AutoRound(https://github.com/intel/auto-round) has been integrated into Transformers, allowing you to run AutoRound-formatted models directly in the upcoming release. Additionally, we are actively working on supporting the GGUF double-quant format, e.g. q4_k_s, stay tuned!
https://huggingface.co/blog/autoround
https://huggingface.co/blog/autoround
Post
2544
Check out [DeepSeek-R1 INT2 model( OPEA/DeepSeek-R1-int2-mixed-sym-inc). This 200GB DeepSeek-R1 model shows only about a 2% drop in MMLU, though it's quite slow due to kernel issue.
| | BF16 | INT2-mixed |
| ------------- | ------ | ---------- |
| mmlu | 0.8514 | 0.8302 |
| hellaswag | 0.6935 | 0.6657 |
| winogrande | 0.7932 | 0.7940 |
| arc_challenge | 0.6212 | 0.6084 |
| | BF16 | INT2-mixed |
| ------------- | ------ | ---------- |
| mmlu | 0.8514 | 0.8302 |
| hellaswag | 0.6935 | 0.6657 |
| winogrande | 0.7932 | 0.7940 |
| arc_challenge | 0.6212 | 0.6084 |