🇮🇳 Gemma-3-1B Hindi Instruct — a Hindi LLM that runs fully offline, anywhere. Last week I shipped Qwen3-4B Hindi. This week I went the other direction: how tiny can a useful Hindi model get? So I fine-tuned Gemma-3-1B on quality-filtered Hindi instruction data and shipped the full GGUF ladder. ✅ Fine-tune (16-bit): pankajpandey-dev/gemma-3-1b-hindi-instruct ✅ GGUF (Q4/Q5/Q8): pankajpandey-dev/gemma-3-1b-hindi-instruct-GGUF Runs in Ollama, llama.cpp, and LM Studio. The Q4_K_M is just 806 MB — runs on CPU, a cheap laptop, even a Raspberry Pi. What I tried this round: chrF-filtered the training data to drop weak translations, and used response-only loss so the model learns how to answer, not how to repeat prompts. Honest note: at 1B, Hindi fluency is strong but coherence is bounded by size — it's a lightweight/edge experiment, not a 4B replacement. Gemma-3-4B Hindi is next. Part of my Hindi LLM Series — openly-licensed Indic models for local & edge use. Feedback welcome 🙏 #Hindi #IndicNLP #GGUF #LocalLLM #Gemma #EdgeAI
🇮🇳 Qwen3-4B Hindi Instruct v2 — a Hindi LLM that runs on your own machine Most strong Hindi-capable models are either huge or cloud-only. I wanted one that's small enough to run locally but actually follows instructions in Hindi — so I fine-tuned Qwen3-4B on 10K Hindi instruction pairs and shipped it with a full GGUF quant ladder. ✅ Fine-tune (16-bit): huggingface.co/pankajpandey-dev/Qwen3-4B-Hindi-Instruct-v2 ✅ GGUF (Q4/Q5/Q8): huggingface.co/pankajpandey-dev/Qwen3-4B-Hindi-Instruct-v2-GGUF Runs in Ollama, llama.cpp, and LM Studio. The Q4_K_M is just 2.5 GB — fits comfortably on a laptop, CPU or GPU. Part of my Hindi LLM Series — building openly-licensed Indic models for local and edge use. More coming (Gemma next). Feedback welcome 🙏 #Hindi #IndicNLP #GGUF #LocalLLM #Qwen
PiD — Pixel Diffusion Decoder Image Edit Upscale and Image Generation Upscale, an all-in-one demo, is now live on Spaces! Great improvements in realism-based image generation and editing are powered by FLUX.2-Klein, while image generation is paired with Z-Image, and upscaling is enabled by default!
🧬 Just uploaded K-quants of Carbon-3B for llama.cpp users! @HuggingFaceBio released the original GGUF in bf16 only — so I added the full quant ladder for CPU/edge inference: • Q2_K → 1.4 GB • Q3_K_M → 1.8 GB • Q4_K_M → 2.1 GB ⭐ • Q5_K_M → 2.4 GB • Q6_K → 2.7 GB • Q8_0 → 3.5 GB 🔗 pankajpandey-dev/Carbon-3B-GGUF Now you can generate DNA sequences on your laptop. Needs a llama.cpp build with PR #23410 (HybridDNATokenizer support). Huge thanks to the HuggingFaceBio team for the original model 🙏 #GGUF #llamacpp #genomics #DNA
I've made 8 Spaces in the Qwen-Image-Edit series, and out of them, 5 Spaces reached “Space of the Week”! A few Spaces are still topping the list even after many months.
Cumulatively, the series has crossed 8.2 million+ ZeroGPU runs and nearly 4 million visitors overall.
@retrain-pipelines v0.2.0 is out ! I'm at Station F at My booth with GOSIM Paris 2026 today & tomorrow. Come meet me for a live in-person demo and a chat !
6 Open-Source Libraries to FineTune LLMs 1. Unsloth GitHub: https://github.com/unslothai/unsloth → Fastest way to fine-tune LLMs locally → Optimized for low VRAM (even laptops) → Plug-and-play with Hugging Face models
3. TRL (Transformer Reinforcement Learning) GitHub: https://github.com/huggingface/trl → RLHF, DPO, PPO for LLM alignment → Built on Hugging Face ecosystem → Essential for post-training optimization
4. DeepSpeed GitHub: https://github.com/microsoft/DeepSpeed → Train massive models efficiently → Memory + speed optimization → Industry standard for scaling
6. PEFT GitHub: https://github.com/huggingface/peft → Fine-tune with minimal compute → LoRA, adapters, prefix tuning → Best for cost-efficient training
Multimodal-Edge Demo, a node-based inference canvas demo, is now live on Spaces. It features node-based Transformers for fast inference across 10+ edge-device multimodal models on the Hub, all within a single space. The series includes models from Qwen3.5, Qwen3-VL, Gemma 4, and the LFM 2.5 VL model series, with support for reasoning and grounding tasks.
This is the best set of AI and ML books and a full guide to learning machine learning from the ground up. This is my study material that I used, so I thought it would be helpful to share it with others. Like, share, and add it to your collection at Ujjwal-Tyagi/ai-ml-foundations-book-collection.
Now, a collection of various compression schemes for Qwen3.6 and the abliterated version 1 of dense models is available on the Hub. Check it out via the links below. 👇
We are hiring at Shirova AI. We need AI researchers and engineers to work in our research lab. Shirova AI is a research lab in India, so we can help our researchers move to nearby workspaces or let them work from home without ever coming to the lab. We're building our founding team, so the pay will be good. You can learn, so don't hesitate to mail us at: careers@shirova.com
HY-World-2.0 — A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds is now available on Spaces, and it works both as native Gradio components and in Gradio server mode.
🚀 Sonic: A lightweight Python audio processing library with tempo matching, BPM detection, time-stretching, resampling & track blending — now with GPU (CUDA) acceleration for 10x speed!
Perfect for quick remixes, batch edits or syncing tracks.