- The MultiModel paper: https://arxiv.org/abs/1706.05137
- The story behind Attention Is All You Need: https://surfingmanifolds.substack.com/p/pandemonium-the-transformers-story
Mohammed Hamdy
AI & ML interests
AI4Sci | NLP | Reinforcement Learning
Recent Activity
repliedto their post 6 days ago
Things rarely go as we expect!
In 2017, Google released the Transformer architecture. While it was clear the model was promising, absolutely no one (including its authors) anticipated the pervasive global revolution it would create!
The authors actually viewed the Transformer as just a stepping stone for a much more ambitious project: The MultiModel.
Their ultimate goal was to build a single deep learning architecture capable of jointly learning massive, diverse tasks across entirely different domains (in 2017). A One Model To Learn Them All.
In fact, the MultiModel paper was published in the exact same month as Attention Is All You Need!
But history had other plans. The building block eclipsed the grand design!
So, have you heard about the MultiModel before? π posted an update 6 days ago
Things rarely go as we expect!
In 2017, Google released the Transformer architecture. While it was clear the model was promising, absolutely no one (including its authors) anticipated the pervasive global revolution it would create!
The authors actually viewed the Transformer as just a stepping stone for a much more ambitious project: The MultiModel.
Their ultimate goal was to build a single deep learning architecture capable of jointly learning massive, diverse tasks across entirely different domains (in 2017). A One Model To Learn Them All.
In fact, the MultiModel paper was published in the exact same month as Attention Is All You Need!
But history had other plans. The building block eclipsed the grand design!
So, have you heard about the MultiModel before? πOrganizations
replied to their post 6 days ago
posted an update 6 days ago
Post
109
Things rarely go as we expect!
In 2017, Google released the Transformer architecture. While it was clear the model was promising, absolutely no one (including its authors) anticipated the pervasive global revolution it would create!
The authors actually viewed the Transformer as just a stepping stone for a much more ambitious project: The MultiModel.
Their ultimate goal was to build a single deep learning architecture capable of jointly learning massive, diverse tasks across entirely different domains (in 2017). A One Model To Learn Them All.
In fact, the MultiModel paper was published in the exact same month as Attention Is All You Need!
But history had other plans. The building block eclipsed the grand design!
So, have you heard about the MultiModel before? π
In 2017, Google released the Transformer architecture. While it was clear the model was promising, absolutely no one (including its authors) anticipated the pervasive global revolution it would create!
The authors actually viewed the Transformer as just a stepping stone for a much more ambitious project: The MultiModel.
Their ultimate goal was to build a single deep learning architecture capable of jointly learning massive, diverse tasks across entirely different domains (in 2017). A One Model To Learn Them All.
In fact, the MultiModel paper was published in the exact same month as Attention Is All You Need!
But history had other plans. The building block eclipsed the grand design!
So, have you heard about the MultiModel before? π
posted an update 5 months ago
Post
3182
The new DeepSeek Engram paper is super fun! It also integrates mHC, and I suspect they're probably releasing all these papers to make the V4 report of reasonable lengthπ
Here's a nice short summary from Gemini
Here's a nice short summary from Gemini
reacted to Kseniase's post with β€οΈ 7 months ago
Post
6507
12 Types of JEPA
Since Yann LeCun together with Randall Balestriero released a new paper on JEPA (Joint-Embedding Predictive Architecture), laying out its theory and introducing an efficient practical version called LeJEPA, we figured you might need even more JEPA. Here are 7 recent JEPA variants plus 5 iconic ones:
1. LeJEPA β LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics (2511.08544)
Explains a full theory for JEPAs, defining the βidealβ JEPA embedding as an isotropic Gaussian, and proposes the SIGReg objective to push JEPA toward this ideal, resulting in practical LeJEPA
2. JEPA-T β JEPA-T: Joint-Embedding Predictive Architecture with Text Fusion for Image Generation (2510.00974)
A text-to-image model that tokenizes images and captions with a joint predictive Transformer, enhances fusion with cross-attention and text embeddings before training loss, and generates images by iteratively denoising visual tokens conditioned on text
3. Text-JEPA β Speaking in Words, Thinking in Logic: A Dual-Process Framework in QA Systems (2507.20491)
Converts natural language into first-order logic, with a Z3 solver handling reasoning, enabling efficient, explainable QA with far lower compute than large LLMs
4. N-JEPA (Noise-based JEPA) β Improving Joint Embedding Predictive Architecture with Diffusion Noise (2507.15216)
Connects self-supervised learning with diffusion-style noise by using noise-based masking and multi-level schedules, especially improving visual classification
5. SparseJEPA β SparseJEPA: Sparse Representation Learning of Joint Embedding Predictive Architectures (2504.16140)
Adds sparse representation learning to make embeddings more interpretable and efficient. It groups latent variables by shared semantic structure using a sparsity penalty while preserving accuracy
6. TS-JEPA (Time Series JEPA) β Joint Embeddings Go Temporal (2509.25449)
Adapts JEPA to time-series by learning latent self-supervised representations and predicting future latents for robustness to noise and confounders
Read further below β
It you like it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe
Since Yann LeCun together with Randall Balestriero released a new paper on JEPA (Joint-Embedding Predictive Architecture), laying out its theory and introducing an efficient practical version called LeJEPA, we figured you might need even more JEPA. Here are 7 recent JEPA variants plus 5 iconic ones:
1. LeJEPA β LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics (2511.08544)
Explains a full theory for JEPAs, defining the βidealβ JEPA embedding as an isotropic Gaussian, and proposes the SIGReg objective to push JEPA toward this ideal, resulting in practical LeJEPA
2. JEPA-T β JEPA-T: Joint-Embedding Predictive Architecture with Text Fusion for Image Generation (2510.00974)
A text-to-image model that tokenizes images and captions with a joint predictive Transformer, enhances fusion with cross-attention and text embeddings before training loss, and generates images by iteratively denoising visual tokens conditioned on text
3. Text-JEPA β Speaking in Words, Thinking in Logic: A Dual-Process Framework in QA Systems (2507.20491)
Converts natural language into first-order logic, with a Z3 solver handling reasoning, enabling efficient, explainable QA with far lower compute than large LLMs
4. N-JEPA (Noise-based JEPA) β Improving Joint Embedding Predictive Architecture with Diffusion Noise (2507.15216)
Connects self-supervised learning with diffusion-style noise by using noise-based masking and multi-level schedules, especially improving visual classification
5. SparseJEPA β SparseJEPA: Sparse Representation Learning of Joint Embedding Predictive Architectures (2504.16140)
Adds sparse representation learning to make embeddings more interpretable and efficient. It groups latent variables by shared semantic structure using a sparsity penalty while preserving accuracy
6. TS-JEPA (Time Series JEPA) β Joint Embeddings Go Temporal (2509.25449)
Adapts JEPA to time-series by learning latent self-supervised representations and predicting future latents for robustness to noise and confounders
Read further below β
It you like it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe
reacted to AdinaY's post with π₯ 8 months ago
Post
3545
BAAI has released ROMEπ₯ evaluating 30+ large reasoning models on text & visual reasoning
FlagEval Findings Report: A Preliminary Evaluation of Large Reasoning Models on Automatically Verifiable Textual and Visual Questions (2509.17177)
β¨Tests visual reasoning, not just recognition
β¨Covers capability Γ alignment Γ safety Γ efficiency
β¨More transparent & reliable (less data contamination)
β¨Helps make real-world deployment choices
FlagEval Findings Report: A Preliminary Evaluation of Large Reasoning Models on Automatically Verifiable Textual and Visual Questions (2509.17177)
β¨Tests visual reasoning, not just recognition
β¨Covers capability Γ alignment Γ safety Γ efficiency
β¨More transparent & reliable (less data contamination)
β¨Helps make real-world deployment choices
reacted to AdinaY's post with π 12 months ago
Post
3195
RoboBrain 2.0π₯ OPEN embedded brain model by BAAIBeijing
BAAI/RoboBrain2.0-7B
β¨ 7B - Apache 2.0 / 32B coming soon
β¨ Supports multiple images, long videos, and high-resolution visuals
β¨ Spatial + temporal reasoning
β¨ Real-time memory & scene graphs
BAAI/RoboBrain2.0-7B
β¨ 7B - Apache 2.0 / 32B coming soon
β¨ Supports multiple images, long videos, and high-resolution visuals
β¨ Spatial + temporal reasoning
β¨ Real-time memory & scene graphs
posted an update about 1 year ago
Post
1742
What inspired the Transformer architecture in the "Attention Is All You Need" paper? And how were various ideas combined to create this groundbreaking model?
In this lengthy article, I explore the story and the origins of some of the ideas introduced in the paper. We'll explore everything from the fundamental attention mechanism that lies at its heart to the surprisingly simple explanation for its name, Transformer.
π‘ Examples of ideas explored in the article:
β What was the inspiration for the attention mechanism?
β How did we go from attention to self-attention?
β Did the team have any other names in mind for the model?
and more...
I aim to tell the story of Transformers as I would have wanted to read it, and hopefully, one that appeals to others interested in the details of this fascinating idea. This narrative draws from video interviews, lectures, articles, tweets/Xs, and some digging into the literature. I have done my best to be accurate, but errors are possible. If you find inaccuracies or have any additions, please do reach out, and I will gladly make the necessary updates.
Read the article: https://huggingface.co/blog/mmhamdy/pandemonium-the-transformers-story
In this lengthy article, I explore the story and the origins of some of the ideas introduced in the paper. We'll explore everything from the fundamental attention mechanism that lies at its heart to the surprisingly simple explanation for its name, Transformer.
π‘ Examples of ideas explored in the article:
β What was the inspiration for the attention mechanism?
β How did we go from attention to self-attention?
β Did the team have any other names in mind for the model?
and more...
I aim to tell the story of Transformers as I would have wanted to read it, and hopefully, one that appeals to others interested in the details of this fascinating idea. This narrative draws from video interviews, lectures, articles, tweets/Xs, and some digging into the literature. I have done my best to be accurate, but errors are possible. If you find inaccuracies or have any additions, please do reach out, and I will gladly make the necessary updates.
Read the article: https://huggingface.co/blog/mmhamdy/pandemonium-the-transformers-story
posted an update over 1 year ago
Post
2787
π We're excited to introduce MemoryCode, a novel synthetic dataset designed to rigorously evaluate LLMs' ability to track and execute coding instructions across multiple sessions. MemoryCode simulates realistic workplace scenarios where a mentee (the LLM) receives coding instructions from a mentor amidst a stream of both relevant and irrelevant information.
π‘ But what makes MemoryCode unique?! The combination of the following:
β Multi-Session Dialogue Histories: MemoryCode consists of chronological sequences of dialogues between a mentor and a mentee, mirroring real-world interactions between coworkers.
β Interspersed Irrelevant Information: Critical instructions are deliberately interspersed with unrelated content, replicating the information overload common in office environments.
β Instruction Updates: Coding rules and conventions can be updated multiple times throughout the dialogue history, requiring LLMs to track and apply the most recent information.
β Prospective Memory: Unlike previous datasets that cue information retrieval, MemoryCode requires LLMs to spontaneously recall and apply relevant instructions without explicit prompts.
β Practical Task Execution: LLMs are evaluated on their ability to use the retrieved information to perform practical coding tasks, bridging the gap between information recall and real-world application.
π Our Findings
1οΈβ£ While even small models can handle isolated coding instructions, the performance of top-tier models like GPT-4o dramatically deteriorates when instructions are spread across multiple sessions.
2οΈβ£ This performance drop isn't simply due to the length of the context. Our analysis indicates that LLMs struggle to reason compositionally over sequences of instructions and updates. They have difficulty keeping track of which instructions are current and how to apply them.
π Paper: From Tools to Teammates: Evaluating LLMs in Multi-Session Coding Interactions (2502.13791)
π¦ Code: https://github.com/for-ai/MemoryCode
π‘ But what makes MemoryCode unique?! The combination of the following:
β Multi-Session Dialogue Histories: MemoryCode consists of chronological sequences of dialogues between a mentor and a mentee, mirroring real-world interactions between coworkers.
β Interspersed Irrelevant Information: Critical instructions are deliberately interspersed with unrelated content, replicating the information overload common in office environments.
β Instruction Updates: Coding rules and conventions can be updated multiple times throughout the dialogue history, requiring LLMs to track and apply the most recent information.
β Prospective Memory: Unlike previous datasets that cue information retrieval, MemoryCode requires LLMs to spontaneously recall and apply relevant instructions without explicit prompts.
β Practical Task Execution: LLMs are evaluated on their ability to use the retrieved information to perform practical coding tasks, bridging the gap between information recall and real-world application.
π Our Findings
1οΈβ£ While even small models can handle isolated coding instructions, the performance of top-tier models like GPT-4o dramatically deteriorates when instructions are spread across multiple sessions.
2οΈβ£ This performance drop isn't simply due to the length of the context. Our analysis indicates that LLMs struggle to reason compositionally over sequences of instructions and updates. They have difficulty keeping track of which instructions are current and how to apply them.
π Paper: From Tools to Teammates: Evaluating LLMs in Multi-Session Coding Interactions (2502.13791)
π¦ Code: https://github.com/for-ai/MemoryCode
posted an update over 1 year ago
Post
3040
β Evaluating Long Context #2: SCROLLS and ZeroSCROLLS
In this series of posts about tracing the history of long context evaluation, we started with Long Range Arena (LRA). Introduced in 2020, Long Range Arens (LRA) is one of the earliest benchmarks designed to tackle the challenge of long context evaluation. But it wasn't introduced to evaluate LLMs, but rather the transformer architecture in general.
π The SCROLLS benchmark, introduced in 2022, addresses this gap in NLP/LLM research. SCROLLS challenges models with tasks that require reasoning over extended sequences (according to 2022 standards). So, what does it offer?
1οΈβ£ Long Text Focus: SCROLLS (unlike LRA) focus mainly on text and contain inputs with thousands of words, testing models' ability to synthesize information across lengthy documents.
2οΈβ£ Diverse Tasks: Includes summarization, question answering, and natural language inference across domains like literature, science, and business.
3οΈβ£ Unified Format: All datasets are available in a text-to-text format, facilitating easy evaluation and comparison of models.
Building on SCROLLS, ZeroSCROLLS takes long text evaluation to the next level by focusing on zero-shot learning. Other features include:
1οΈβ£ New Tasks: Introduces tasks like sentiment aggregation and sorting book chapter summaries.
2οΈβ£ Leaderboard: A live leaderboard encourages continuous improvement and competition among researchers.
π‘ What are some other landmark benchmarks in the history of long context evaluation? Feel free to share your thoughts and suggestions in the comments.
- SCROLLS Paper: SCROLLS: Standardized CompaRison Over Long Language Sequences (2201.03533)
- ZeroSCROLLS Paper: ZeroSCROLLS: A Zero-Shot Benchmark for Long Text Understanding (2305.14196)
In this series of posts about tracing the history of long context evaluation, we started with Long Range Arena (LRA). Introduced in 2020, Long Range Arens (LRA) is one of the earliest benchmarks designed to tackle the challenge of long context evaluation. But it wasn't introduced to evaluate LLMs, but rather the transformer architecture in general.
π The SCROLLS benchmark, introduced in 2022, addresses this gap in NLP/LLM research. SCROLLS challenges models with tasks that require reasoning over extended sequences (according to 2022 standards). So, what does it offer?
1οΈβ£ Long Text Focus: SCROLLS (unlike LRA) focus mainly on text and contain inputs with thousands of words, testing models' ability to synthesize information across lengthy documents.
2οΈβ£ Diverse Tasks: Includes summarization, question answering, and natural language inference across domains like literature, science, and business.
3οΈβ£ Unified Format: All datasets are available in a text-to-text format, facilitating easy evaluation and comparison of models.
Building on SCROLLS, ZeroSCROLLS takes long text evaluation to the next level by focusing on zero-shot learning. Other features include:
1οΈβ£ New Tasks: Introduces tasks like sentiment aggregation and sorting book chapter summaries.
2οΈβ£ Leaderboard: A live leaderboard encourages continuous improvement and competition among researchers.
π‘ What are some other landmark benchmarks in the history of long context evaluation? Feel free to share your thoughts and suggestions in the comments.
- SCROLLS Paper: SCROLLS: Standardized CompaRison Over Long Language Sequences (2201.03533)
- ZeroSCROLLS Paper: ZeroSCROLLS: A Zero-Shot Benchmark for Long Text Understanding (2305.14196)
reacted to lewtun's post with π₯β€οΈ over 1 year ago
Post
5551
Introducing OpenR1-Math-220k!
open-r1/OpenR1-Math-220k
The community has been busy distilling DeepSeek-R1 from inference providers, but we decided to have a go at doing it ourselves from scratch πͺ
Whatβs new compared to existing reasoning datasets?
βΎ Based on AI-MO/NuminaMath-1.5: we focus on math reasoning traces and generate answers for problems in NuminaMath 1.5, an improved version of the popular NuminaMath-CoT dataset.
π³ 800k R1 reasoning traces: We generate two answers for 400k problems using DeepSeek R1. The filtered dataset contains 220k problems with correct reasoning traces.
π 512 H100s running locally: Instead of relying on an API, we leverage vLLM and SGLang to run generations locally on our science cluster, generating 180k reasoning traces per day.
β³ Automated filtering: We apply Math Verify to only retain problems with at least one correct answer. We also leverage Llama3.3-70B-Instruct as a judge to retrieve more correct examples (e.g for cases with malformed answers that canβt be verified with a rules-based parser)
π We match the performance of DeepSeek-Distill-Qwen-7B by finetuning Qwen-7B-Math-Instruct on our dataset.
π Read our blog post for all the nitty gritty details: https://huggingface.co/blog/open-r1/update-2
open-r1/OpenR1-Math-220k
The community has been busy distilling DeepSeek-R1 from inference providers, but we decided to have a go at doing it ourselves from scratch πͺ
Whatβs new compared to existing reasoning datasets?
βΎ Based on AI-MO/NuminaMath-1.5: we focus on math reasoning traces and generate answers for problems in NuminaMath 1.5, an improved version of the popular NuminaMath-CoT dataset.
π³ 800k R1 reasoning traces: We generate two answers for 400k problems using DeepSeek R1. The filtered dataset contains 220k problems with correct reasoning traces.
π 512 H100s running locally: Instead of relying on an API, we leverage vLLM and SGLang to run generations locally on our science cluster, generating 180k reasoning traces per day.
β³ Automated filtering: We apply Math Verify to only retain problems with at least one correct answer. We also leverage Llama3.3-70B-Instruct as a judge to retrieve more correct examples (e.g for cases with malformed answers that canβt be verified with a rules-based parser)
π We match the performance of DeepSeek-Distill-Qwen-7B by finetuning Qwen-7B-Math-Instruct on our dataset.
π Read our blog post for all the nitty gritty details: https://huggingface.co/blog/open-r1/update-2
reacted to AdinaY's post with π§ π₯ over 1 year ago
Post
2881
BIG release by DeepSeek AIπ₯π₯π₯
DeepSeek-R1 & DeepSeek-R1-Zero: two 660B reasoning models are here, alongside 6 distilled dense models (based on Llama & Qwen) for the community!
deepseek-ai
deepseek-ai/DeepSeek-R1
β¨ MIT License : enabling distillation for custom models
β¨ 32B & 70B models match OpenAI o1-mini in multiple capabilities
β¨ API live now! Access Chain of Thought reasoning with model='deepseek-reasoner'
DeepSeek-R1 & DeepSeek-R1-Zero: two 660B reasoning models are here, alongside 6 distilled dense models (based on Llama & Qwen) for the community!
deepseek-ai/DeepSeek-R1
β¨ MIT License : enabling distillation for custom models
β¨ 32B & 70B models match OpenAI o1-mini in multiple capabilities
β¨ API live now! Access Chain of Thought reasoning with model='deepseek-reasoner'
reacted to hba123's post with π over 1 year ago
Post
1830
Blindly applying algorithms without understanding the math behind them is not a good idea frmpv. So, I am on a quest to fix this!
I wrote my first hugging face article on how you would derive closed-form solutions for KL-regularised reinforcement learning problems - what is used for DPO.
Check it out: https://huggingface.co/blog/hba123/derivingdpo
I wrote my first hugging face article on how you would derive closed-form solutions for KL-regularised reinforcement learning problems - what is used for DPO.
Check it out: https://huggingface.co/blog/hba123/derivingdpo
reacted to fdaudens's post with π over 1 year ago
Post
1407
π From instruction-following to creative storytelling, dive into 2024's most impactful AI datasets! These gems are shaping everything from scientific research to video understanding.
Check it out: huggingface/open-source-ai-year-in-review-2024
Check it out: huggingface/open-source-ai-year-in-review-2024
reacted to dvilasuero's post with π₯ over 1 year ago
Post
2799
π Announcing Global-MMLU: an improved MMLU Open dataset with evaluation coverage across 42 languages, built with Argilla and the Hugging Face community.
Global-MMLU is the result of months of work with the goal of advancing Multilingual LLM evaluation. It's been an amazing open science effort with collaborators from Cohere For AI, Mila - Quebec Artificial Intelligence Institute, EPFL, Massachusetts Institute of Technology, AI Singapore, National University of Singapore, KAIST, Instituto Superior TΓ©cnico, Carnegie Mellon University, CONICET, and University of Buenos Aires.
π·οΈ +200 contributors used Argilla MMLU questions where regional, dialect, or cultural knowledge was required to answer correctly. 85% of the questions required Western-centric knowledge!
Thanks to this annotation process, the open dataset contains two subsets:
1. π½ Culturally Agnostic: no specific regional, cultural knowledge is required.
2. βοΈ Culturally Sensitive: requires dialect, cultural knowledge or geographic knowledge to answer correctly.
Moreover, we provide high quality translations of 25 out of 42 languages, thanks again to the community and professional annotators leveraging Argilla on the Hub.
I hope this will ensure a better understanding of the limitations and challenges for making open AI useful for many languages.
Dataset: https://huggingface.co/datasets/CohereForAI/Global-MMLU
Global-MMLU is the result of months of work with the goal of advancing Multilingual LLM evaluation. It's been an amazing open science effort with collaborators from Cohere For AI, Mila - Quebec Artificial Intelligence Institute, EPFL, Massachusetts Institute of Technology, AI Singapore, National University of Singapore, KAIST, Instituto Superior TΓ©cnico, Carnegie Mellon University, CONICET, and University of Buenos Aires.
π·οΈ +200 contributors used Argilla MMLU questions where regional, dialect, or cultural knowledge was required to answer correctly. 85% of the questions required Western-centric knowledge!
Thanks to this annotation process, the open dataset contains two subsets:
1. π½ Culturally Agnostic: no specific regional, cultural knowledge is required.
2. βοΈ Culturally Sensitive: requires dialect, cultural knowledge or geographic knowledge to answer correctly.
Moreover, we provide high quality translations of 25 out of 42 languages, thanks again to the community and professional annotators leveraging Argilla on the Hub.
I hope this will ensure a better understanding of the limitations and challenges for making open AI useful for many languages.
Dataset: https://huggingface.co/datasets/CohereForAI/Global-MMLU
reacted to davidberenstein1957's post with π§ ππ over 1 year ago
Post
3538
The Data Is Better Together community is set to release the first Apache 2 licensed image preference dataset!
Great work and let's give this a final push :)
@aashish1904 congrats on your month of HF pro. There is more to win during this sprint!
@aashish1904 @AnyaDesdein @davidberenstein1957 @Malalatiana @beta3 @fffiloni @munish0838 @Reza2kn @bbunzeck @Creazycreator @andrei-saceleanu @jafhaponiuk @rca-etl @kf120 @burtenshaw @mmhamdy @grib0ed0v @Doopus @AnyaDes @ttkap @Xceron @Lewox @davanstrien @Azazelle @adirik @Ashish08 @AntonVic @kenantang @sdiazlor @g-ronimo @dennis-rall @prithivMLmods @girtss3 @flozi00 @WaveCut @Taylor658 @Wildminder @Sara9999 @phaelishall @sararob @dvilasuero @pgabrys @plaguss @CDS899 @timajwilliams @rudzinskimaciej @pavel-ai @aggr8 @ignacioct @MouseAI @Leeps @MaksKul @NicolasDmln @Muinez @kusht55 @caiolang @Jakub-Brand24 @loamy @Demijan @eliab96 @Viewegger @JosephCatrambone @p1atdev @mrshu @o639 @Targezed @Aviv-anthonnyolime @thliang01 @Ahmed-Amine @glards @pranaykoppula @nataliaElv @MaPirlet @alvarobartt @gabrielmbmb @zlicastro @Jaydip @Chouettecheveche @lilcheaty @ruyrdiaz @robintema @fdaudens @ggcristian @a-r-r-o-w @pates @joheras @stopsatgreen @bezo97 @chachi902 @iamyann @liamcripwell @dmb23 @korbih @anonymous7743 @akbdx18 @OVAWARE @severo @akontra @lichorosario @lhoestq @SebastianBodza @Vishnou @ameerazam08 @appoose @Mukei @mearco @joaquincabezas @Fizzarolli @thomastraum @igortopolski @OxxoCodes @patrickfleith @asoria @bn22 @sitammeur @Krodolf @bergr7f @Sbxxn @wietsevenema @sugatoray @Iamladi @MikeTrizna @feveromo @mokady @Bolero @prath @Dowwie @kfahn @decodingchris @alili2050 @RahulRaman @yzimmermann @Ameeeee @ecyht2 @MattMC001 @hemanthkumarak @Thegorgibus @akos2 @LawRun @ramithuh @SuperMuel @sjans @peterizsak @mosama @Eyel @mtr3 @cfahlgren1 @legentil @clem @Citaman @Aurelien-Morgan @AntoineBourgois @TotoB12 @Stanmey @osanseviero @multimodalart @maxiw @ariG23498 @ngk89 @femboysLover @dvs @tacohiddink @blanchon @DavidJimenez
Great work and let's give this a final push :)
@aashish1904 congrats on your month of HF pro. There is more to win during this sprint!
@aashish1904 @AnyaDesdein @davidberenstein1957 @Malalatiana @beta3 @fffiloni @munish0838 @Reza2kn @bbunzeck @Creazycreator @andrei-saceleanu @jafhaponiuk @rca-etl @kf120 @burtenshaw @mmhamdy @grib0ed0v @Doopus @AnyaDes @ttkap @Xceron @Lewox @davanstrien @Azazelle @adirik @Ashish08 @AntonVic @kenantang @sdiazlor @g-ronimo @dennis-rall @prithivMLmods @girtss3 @flozi00 @WaveCut @Taylor658 @Wildminder @Sara9999 @phaelishall @sararob @dvilasuero @pgabrys @plaguss @CDS899 @timajwilliams @rudzinskimaciej @pavel-ai @aggr8 @ignacioct @MouseAI @Leeps @MaksKul @NicolasDmln @Muinez @kusht55 @caiolang @Jakub-Brand24 @loamy @Demijan @eliab96 @Viewegger @JosephCatrambone @p1atdev @mrshu @o639 @Targezed @Aviv-anthonnyolime @thliang01 @Ahmed-Amine @glards @pranaykoppula @nataliaElv @MaPirlet @alvarobartt @gabrielmbmb @zlicastro @Jaydip @Chouettecheveche @lilcheaty @ruyrdiaz @robintema @fdaudens @ggcristian @a-r-r-o-w @pates @joheras @stopsatgreen @bezo97 @chachi902 @iamyann @liamcripwell @dmb23 @korbih @anonymous7743 @akbdx18 @OVAWARE @severo @akontra @lichorosario @lhoestq @SebastianBodza @Vishnou @ameerazam08 @appoose @Mukei @mearco @joaquincabezas @Fizzarolli @thomastraum @igortopolski @OxxoCodes @patrickfleith @asoria @bn22 @sitammeur @Krodolf @bergr7f @Sbxxn @wietsevenema @sugatoray @Iamladi @MikeTrizna @feveromo @mokady @Bolero @prath @Dowwie @kfahn @decodingchris @alili2050 @RahulRaman @yzimmermann @Ameeeee @ecyht2 @MattMC001 @hemanthkumarak @Thegorgibus @akos2 @LawRun @ramithuh @SuperMuel @sjans @peterizsak @mosama @Eyel @mtr3 @cfahlgren1 @legentil @clem @Citaman @Aurelien-Morgan @AntoineBourgois @TotoB12 @Stanmey @osanseviero @multimodalart @maxiw @ariG23498 @ngk89 @femboysLover @dvs @tacohiddink @blanchon @DavidJimenez
reacted to AdinaY's post with π over 1 year ago
Post
1680
π The wave of reasoning models from the Chinese community has arrived!
π Marco-o1 by AIDC, Alibaba
π AIDC-AI/Marco-o1
β¨ QwQ by Qwen, Alibaba
π Qwen/qwq-674762b79b75eac01735070a
π Skywork-o1 by Kunlun Tech
π Skywork/skywork-o1-open-67453df58e12f6c3934738d0
π₯ Xkev/Llama-3.2V-11B-cot by PKU Yuan group
π Xkev/Llama-3.2V-11B-cot
π‘ DeepSeek-R1-Lite-Preview by DeepSeek AI
π https://chat.deepseek.com/
π InternThinker Preview by Shanghai AI Lab
π https://sso.openxlab.org.cn/login?redirect=https://internlm-chat.intern-ai.org.cn/&clientId=ebmrvod6yo0nlzaek1yp
π k0-math by Moonshot AI
π https://kimi.moonshot.cn/ ( coming soon! )
Who's next? π
zh-ai-community/reasoning-models-67409fb3aa1ed78f10087cd7
π Marco-o1 by AIDC, Alibaba
π AIDC-AI/Marco-o1
β¨ QwQ by Qwen, Alibaba
π Qwen/qwq-674762b79b75eac01735070a
π Skywork-o1 by Kunlun Tech
π Skywork/skywork-o1-open-67453df58e12f6c3934738d0
π₯ Xkev/Llama-3.2V-11B-cot by PKU Yuan group
π Xkev/Llama-3.2V-11B-cot
π‘ DeepSeek-R1-Lite-Preview by DeepSeek AI
π https://chat.deepseek.com/
π InternThinker Preview by Shanghai AI Lab
π https://sso.openxlab.org.cn/login?redirect=https://internlm-chat.intern-ai.org.cn/&clientId=ebmrvod6yo0nlzaek1yp
π k0-math by Moonshot AI
π https://kimi.moonshot.cn/ ( coming soon! )
Who's next? π
zh-ai-community/reasoning-models-67409fb3aa1ed78f10087cd7