Hugging Face – Posts

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

All HF Hub posts

FlameF0X

posted an update 2 days ago

Post

6641

MiniMax-M3 coming soon.
https://github.com/MiniMax-AI/MiniMax-M3

hypothetical

posted an update 1 day ago

Post

1937

The smallest and the highest quality in the world Gemma4 E2B and E4B models! 7x compression! From 9.3GB -> 1.4GB!

TheStageAI/gemma-4-E2B-it
TheStageAI/gemma-4-E4B-it

1 reply

sergiopaniego

posted an update 3 days ago

Post

6108

new banger blog alert 🚨

@ariG23498 is starting a blog series about profiling in pytorch and part 1 just dropped

takes you from the simplest scenario to actually knowing what your gpu is doing. if you have never opened a profiler trace this is where you start

covers torch.profiler from scratch. reading tables and traces, overhead bound vs compute bound, the full dispatch chain from python to gpu kernels, and what torch.compile is actually fusing under the hood

find it here: https://huggingface.co/blog/torch-profiler

1 reply

evalstate

posted an update 1 day ago

Post

1883

Hugging Face MCP Server v0.3.17
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

SEP-2640 "Skills Over MCP" support added (early access)

1 reply

ovi054

posted an update 2 days ago

Post

1929

Qwen Image Edit 2511 Fast + LoRA ⚡

ovi054/Qwen-Image-Edit-2511-LoRA

QIE-2511 is an image editing model with integrated LoRA capabilities. You can add any custom LoRA to generate and edit images within this Space.

👉 Try it now: ovi054/Qwen-Image-Edit-2511-LoRA

sergiopaniego

posted an update 2 days ago

Post

1869

most multi-turn RL loops have a silent bug: you decode the model's output to detect tool calls, then re-tokenize the conversation for the next turn. BPE isn't invertible, so decode then re-encode can land on different ids. gradient ends up on tokens the model never sampled. no crash, just quietly wrong math and broken training

@qgallouedec wrote a super educational blog on MITO (message-in, token-out) vs TITO (token-in, token-out) and how you might fix the problem above

go read it 🤓

https://qgallouedec-tito.hf.space/

lbourdois

posted an update 3 days ago

Post

693

New blog post!
An introduction to a little-known but highly effective model reduction method: 𝗧𝗿𝗶𝗺𝗺𝗶𝗻𝗴✂️
We show how to reduce model size (we went up to 87.24% reduction) while preserving its performance.

We applied this technique to 16 different model families across several modalities to illustrate that it works on any architecture (as long as the embedding layer is the last one of the model) and on any modality involving text.
From these 16 families, we generated over 𝟱,𝟱𝟬𝟬 𝗺𝗼𝗻𝗼𝗹𝗶𝗻𝗴𝘂𝗮𝗹 𝗺𝗼𝗱𝗲𝗹𝘀 𝗶𝗻 𝟭𝟮𝟰 𝗱𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝘁 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲𝘀 🌍

Key takeaways from our experiments:
1️⃣ Trimming does not require a GPU. Our models were obtained on a CPU.
2️⃣ This method scales up to at least 4B parameters (we did not test beyond that).
3️⃣ Trimmed model is smaller than the original while preserving its performance. If you observe a slight performance drop, just fine-tuned to recover or even surpass the original performance.
4️⃣ For an equivalent compute budget, it is better to trim then fine-tune rather than fine-tuning the original model. Since the model is smaller, you can run more epochs/show more data and get in fine a better model than the original.
5️⃣ Trimming is a competitive alternative to distillation and quantization. E.g. we obtained our alternative to DistilBERT in 9 minutes on CPU vs. 90 hours of GPU for the latter.
6️⃣ Trimming could generate reasoning traces in the language of the trimmed model. This could be an alternative to generating traces in English and then translating them into the desired language.

And many other things (such as how much data are needed, the impact of the database used, the order in which it should be done, etc.) are available in the blogpost!

Blogpost: https://huggingface.co/blog/lbourdois/introduction-to-trimming
Models: alphaedge-ai/Trimming_models_search

4 replies

RakshitAralimatti

posted an update 3 days ago

Post

442

Reading engineering and research blogs from OpenAI, Anthropic, DeepMind, Meta and others has genuinely leveled up my understanding of AI systems and helped me in my day-to-day work. But keeping track of 20+ sites manually is a pain.

So I built AI Blogs Tracker — a Streamlit app that scrapes the actual blog listing pages (not search) of 20+ top AI companies and surfaces titles, dates, and links in one clean feed. Filter by source, by date, star posts to a reading list, or add your own custom sources.

One click. ~30 seconds. Everything in one place.

🔗 GitHub link - https://github.com/rakshit2020/Tech-Blogs-Tracker-of-Top-AI-Companies-Agent

1 reply

AxionLab-official

posted an update about 15 hours ago

Post

555

We're happy to announce that we released a Reasoning tuned version of Supra-50M!

SupraLabs/Supra-50M-Reasoning

salma-remyx

posted an update about 16 hours ago

Post

847

In that benchmark comparison, do you even have the sample size to distinguish two models, or are you making decisions based on statistical noise?

"Resolution Diagnostics for Paired LLM Evaluation" offers a simple check: a per-pair resolution ratio q = N/N* that flags when a displayed ranking sits below the resolution floor regardless of p-value.
arXiv: https://arxiv.org/abs/2605.30315v1

Outrider automatically matched this paper to our fork of lm-evaluation-harness and opened a PR implementing the diagnostic.

Configure the action to find new methods tailored to your repo: https://github.com/remyxai/outrider

Recently active users