๐ Released two Responsible AI lightweight instruction-tuned models focused on toxicity, bias, and safety analysis
Model 1: Responsible AI Safety Assistant (Qwen 2.5)
Kurapika993/qwen2.5-7b-responsible-ai-qlora Base Model: Qwen2.5-7B-Instruct Method: QLoRA Training Data: BeaverTails + Wiki Toxic + custom Responsible AI instruction dataset
I just built my first working AI tool: **Team Lunch Order Collector** for my organization!
Features so far: - Fixed menu with prices (Kenyan favorites) - Easy ordering (Individual or Group) - Live summary + total cost - AI suggestions for the organizer
Still polishing images and a few things, but it's already usable.
I'm participating in the Build Small Hackathon! ๐
I chose the Backyard AI chapter: solve a real problem for someone you know.
My idea is to help a business owner extract valuable information from their data. She operates, negotiates and provides support for customers all through Whatsapp. She doesn't use Notion, Obsidian, Spreadsheets and does not like doing repetitive data-entry tasks. It's a messy, chaotic messaging workflow that kind of works for her now.
What if she could open a Space, upload her exported WhatsApp data, and a fine-tuned model tailored to her business domain and conversation style extracted, classified, organized all her customers and deals into a coherent dashboard, along with a chatbot for her to ask questions about her business? That's my current approach.
Human brains don't recreate every pixel to understand the world!
Most current models in genomics, proteomics, and single-cell transcriptomics rely on generative objectives like masked language modeling or next token prediction. While effective, these architectures waste significant capacity reconstructing raw, noisy sequence details that may not carry functional biological meaning.
But a promising, more efficient alternative is emerging: Joint-Embedding Predictive Architecture (JEPA)
Originally introduced by Yann LeCun for computer vision, JEPA is a non-generative, self-supervised learning (SSL) framework. Instead of predicting raw inputs, it operates as a world model that predicts abstract semantic embeddings in latent space.
Recently, the JEPA framework (and its more efficient LeJEPA variant) has been adapted into the biological sciences to develop performing foundation models and to improve on already existing ones.
It's interesting how each adaptation modified and tailored JEPA to suit its specific biological domain, whether by experimenting with different backbones or complementing the objective with other loss terms.
For example, JEPA-DNA and ProteinJEPA used JEPA as a continual pre-training framework to enhance existing foundation models without training from scratch, while Cell-JEPA and JEPA-DNA employed a hybrid objective that combines the JEPA loss with a traditional language modeling loss.
The article below provides an overview of these implementations, along with others that came out this year. As always, your thoughts and feedback are welcome and highly appreciated!
After Consilium, I wanted to build something entirely different. Smaller. Weirder.
The idea came from a single Asterix sceneโAsterix and Obelix trying to obtain Permit A38, only to discover you need Permit A38 to apply for Permit A38. That's the whole game.
๐ช๐ต๐ฎ๐ ๐ ๐ฏ๐๐ถ๐น๐: A bureaucratic text adventure powered by one fine-tuned 1.7B model playing five different charactersโeach with their own system prompt, their own rules, and their own way of sending you in circles.
โ Clerk Vitalstatistixโrequires Form 27b/6. Has never issued a permit in 23 years.ย
โ Supervisor Caligula Minusโperpetually at lunch. Invented A38 in 1987, forgot why.ย
โ SYSTEMA v2.3โlast updated 1994. Every response starts with an error code.ย
โ Form 27b/6โsent. Not happy about it. Page 3 is always missing.ย
โ Ombudsman Panoramixโinvestigates complaints about the process. It is also the process.
๐ง๐ต๐ฒ ๐๐ฒ๐ฐ๐ต:
- Fine-tuned SmolLM2-1.7B on ~1000 synthetic NPC examples generated with Claude. - Converted to GGUF Q4_K_M for fast CPU inference - Streaming responses so you feel the bureaucracy arriving in real time - Custom beige government office UI
FlameF0X/Qwen3-4B-Distilled-Claude-4.6 (NVFP4 and MXFP4) sit at ranks 23 and 24 with 62.68% and 61.18% average, right below the base Qwen3-4B. Not bad considering they were distilled from Claude 4.6 rather than trained from scratch.
The funny one is FlameF0X/Qwen2-0.2B-pt and FlameF0X/Qwen2-0.2B-it. They're not properly trained โ genuinely undertrained, basically undefined โ and they still beat openai/gpt-oss-20b at rank 66. The 20B model. Not sure what that says but it's something.
FlameF0X/LFM2-Research is at the bottom of my lineup but it's a research artifact, not meant to be competitive.
Chart below showing my models vs nearby competitors, with size vs performance on the left.
Chart made by Claude
1 reply
ยท
reactedtoevillegasgarcia'spost with ๐๐about 5 hours ago
Introducing ESM3-PPISites! ๐งฌ๐ค We leveraged the multimodal ESM3 to predict protein-protein interaction interfaces with state-of-the-art accuracyโusing only sequence data! By feeding these predictions into HADDOCK ๐งฉ, we can accurately reconstruct protein complexes while reducing computation time by an order of magnitude. ๐ Test the live model on spaces: area-science-park/esm3-ppisites ๐ Read the preprint: https://doi.org/10.64898/2026.05.29.728739 #Bioinformatics #MachineLearning #ProteinDocking
reactedtoJiaqi-hkust'spost with ๐ฅabout 5 hours ago
๐ Introducing Robust-U1: Teaching MLLMs to Self-Recover Corrupted Visual Content
Multimodal Large Language Models (MLLMs) have achieved impressive visual understanding, yet they remain highly brittle under real-world corruptionsโnoise, blur, compression artifacts, adverse weather.
Standard MLLMs suffer dramatic performance drops, and existing robustness solutions come with fundamental limits: blackโbox feature alignment lacks interpretability, while whiteโbox text reasoning cannot restore the lost pixelโlevel visual details. This raises a crucial question:
๐ง Can MLLMs recover corrupted visual content by themselves?
If the answer is yes, we can move beyond merely โcompensatingโ for corruption and instead build a more intrinsic, generalizable form of resilience. Robust-U1 is our answer to that question.
Every parent, teacher, or babysitter knows the moment. The lights go dim. Blankets come out. Your child asks for a song. Then another. Then suddenly youโre improvising lyrics about dinosaurs, fire trucks, and princesses while trying to convince a little one that itโs actually bedtime.
Thatโs exactly the problem my partnerโs sister faces as a kindergarten teacher. Every day she runs nap time for fifteen 4-year-olds, and ever since they learned about music and instruments in class, it starts the same way: "sing a song for me." She'd love to give each child their own song, built from whatever they love that week, but she doesn't have the time, the musical training, or a tool that could do it. So @volivers and I built one.
Introducing Lolaby โ our submission to the Hugging Face Build Small Hackathon 2026, hosted by Gradio and backed by OpenBMB, OpenAI, NVIDIA, Modal, Cohere, JetBrains, and Black Forest Labs.
A child draws something they love (on screen or on paper), a name is entered, and a tiny AI watches the drawing, writes a personalised lullaby, and sings it back.
Everything runs locally. No cloud LLMs. No per-song API cost. No child's drawing or name ever leaves the device.
The full pipeline: ๐ผ๏ธ MiniCPM-V 4.6 (1.3B) reads the drawing. โ๏ธ A fine-tuned Llama 3.2 3B writes the lyrics โ trained on 1,500 lullabies with strict anti-boilerplate gates. ๐ต Kokoro 82M sings the result over custom DSP instruments.
Drop a like, upvote or comment. Feedback is welcome! ๐
I fine-tuned OpenBMB's MiniCPM5-1B to write Triton GPU kernels, then let an immutable referee decide if they are real: compile, check correctness against PyTorch on adversarial inputs, time against eager, torch.compile, and torch.compile max-autotune, then block the known ways of gaming the benchmark.
The 1B setup beat torch.compile max-autotune in 12/12 independently seeded runs. The larger Qwen3.6-27B smith pushed the same referee loop further: 76 verified compiler-beating kernels on H200, with 69 surviving a 5-run stability gate and 7 kept as single-shot probes on unseen problems. On a 376-cell shape/dtype grid, the stability-gated kernels keep a 1.49x geomean, with about 10% of cells losing and reported per cell.
Honest bound: these are scheduling wins on memory-bound ops, not new algorithms or wins over cuBLAS/FlashAttention. The scarce thing is not the big model, it is the verifier it cannot fool.
I built Read-Along AI for the Hugging Face Build Small Hackathon.
It is an offline-capable reading practice app for early readers: one short sentence at a time, tap-to-hear word help, record a read-aloud attempt, then get gentle feedback.
The goal is Backyard AI in the literal sense: a tool for real home reading practice, where feedback needs to be patient, developmentally fair, and private. A childโs voice should not need to leave the app just to practice โThe dog ran fast.โ
What makes it small-model native:
- Exact clean readings pass immediately. - Close or ambiguous child-speech transcripts get a second look from a fine-tuned MiniCPM phonetic evaluator. - Meaning-changing mistakes still fail closed, e.g. โblue hatโ should not pass for โred hat.โ - Off the Grid Mode runs local ASR plus the MiniCPM GGUF evaluator through llama.cpp. - Turbo Mode uses Modal endpoints for lower-latency ASR/TTS/evaluation. - The UI is custom Gradio with a child-facing reading canvas, clickable words, progress feedback, and celebration on success.
Targeted tracks and badges: Backyard AI, Off-Brand, Off the Grid, Llama Champion, Well-Tuned, Tiny Titan, Sharing is Caring, Field Notes.
We trained an open-source Mythos like cybersecurity LLM for the Build Small Hackathon meet OpenMythos
Trained in two stages: SFT on ~1.84K filtered ArXiv cs.CR papers + real CVE data, then RLVR using paired with past vulnerabilities GitHub repos with a verifier model checking outputs against ground truth.
Trained on: H100s from Modal
The RLVR stage made the biggest difference responses got more precise and less prone to confusing similar vulnerability classes.
Just published my first model on the Hub: daanhoekstra/sds-ner-compliance A token classification model for extracting structured fields from GHS Safety Data Sheets, signal words, H-codes, P-codes, CAS numbers, built on SciBERT. Comes with a full write-up on automating GHS and supply chain label compliance with open-source NLP. Feedback welcome!
reactedtoProCreations'spost with ๐about 5 hours ago
Excited to open-source the VisDrone Aerial Object Detection Model Zoo on Hugging Face.
The collection includes multiple YOLO variants trained and evaluated on the VisDrone benchmark for aerial object detection, with accompanying documentation and performance metrics.
If you're working on drones, aerial surveillance, robotics, or small-object detection, I hope these models save you some time.
๐ Introducing PerceptionDLM โ the first multimodal diffusion LLM for parallel region perception!
Most MLLMs are autoregressive, so captioning N regions costs N sequential passes. PerceptionDLM instead describes ALL masked regions in a single denoising process. ๐งฉ
โจ Highlights โข โก Up to 3.4ร faster on dense multi-region captioning, with stable per-image latency โข ๐ PerceptionDLM-Base beats LLaDA-V on 15/16 multimodal benchmarks (new SOTA among open diffusion VLMs) โข ๐ New benchmark: ParaDLC-Bench โ jointly evaluates caption quality AND inference efficiency โข ๐ Code, models & benchmark all open-sourced
Shadows of Tomorrow is finally live on Hugging Face Spaces with Gradio.
Itโs a browser-playable RPG built with Godot, set in a post-nuclear future where players explore Magnus Province, collect medicinal plants, craft medicine, and help cure NPCs.