🏗️ Building on HF

Sergio Paniego PRO

sergiopaniego

huggingface

·

https://sergiopaniego.github.io/

AI & ML interests

None yet

Recent Activity

updated a dataset about 5 hours ago

agents-course/final-certificates

updated a dataset about 5 hours ago

agents-course/course-certificates-of-excellence

updated a dataset about 20 hours ago

huggingface-projects/Deep-RL-Course-Certification

View all activity

Organizations

buckets 69

sergiopaniego/c3-pathological-static-fda5bc-bucket

sergiopaniego/c3-pathological-trackio-bucket

sergiopaniego/c3-grpo-static-ce85df-bucket

sergiopaniego/c3-grpo-trackio-bucket

sergiopaniego/c3-grpo-static-7e9597-bucket

sergiopaniego/c3-grpo-static-0305b2-bucket

View 69 buckets

Posts 103

Post

72

you can now train your own coding agents with trl + openenv, starting with opencode

we just added end-to-end support for training agent harnesses:

> TRL: a loop-owning training path (AsyncGRPOTrainer + HarnessRolloutWorker) that launches the agent in an OpenEnv session, reads back its trace, reconstructs the training samples, and trains with AsyncGRPO
> OpenEnv: the OpenCode harness environment plus a transparent proxy that forwards the agent's model calls and records each turn's token ids and logprobs

you train the actual opencode agent as is, it runs its own loop and tools and the policy learns from the exact tokens it produced

we're shipping a self-contained example: local subprocess sandbox, DeepCoder problems, validated on Qwen3-8B.

> example: https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/opencode.py
> docs: https://huggingface.co/docs/trl/main/openenv

and we're working actively on both sides so expect more 🤓

Articles 26

Article

38

Profiling in PyTorch (Part 3): Attention is all you profile

View all Articles

Collections 10

View 10 collections

spaces 173

VLM Object Understanding

Explore object detection, visual grounding, keypoint Detecti

Qwen2-VL-7B

Ask questions about charts in images

SmolVLM-trl-dpo-rlaif-v

Generate text from an image and question

SmolVLM-trl-sft-ChartQA

Ask questions about charts in images

C3 Pathological Static Fda5bc

Monitor your projects with an interactive dashboard

C3 Grpo Static Ce85df

Track and visualize your data with an interactive dashboard

View 173 Spaces

models 143

sergiopaniego/qwen3-0.6b-mbpp-grpo-k2

Text Generation • 0.6B • Updated 1 day ago • 40

sergiopaniego/qwen3-0.6b-mbpp-grpo-k16

Text Generation • 0.6B • Updated 1 day ago • 47

sergiopaniego/qwen3-0.6b-mbpp-grpo-k8

Text Generation • 0.6B • Updated 1 day ago • 47

sergiopaniego/Qwen3.5-4B-sdpo-math-hints

Updated 16 days ago • 1

sergiopaniego/Qwen3.5-4B-sdpo-math-gold

Updated 16 days ago

sergiopaniego/Qwen3.5-4B-sdpo-math-baseline

Updated 16 days ago

sergiopaniego/sdpo-hints

Updated 16 days ago

sergiopaniego/pi-mono-youtube-livestream-2-scripts

Updated 20 days ago • 2

sergiopaniego/gemma-4-E2B-offpolicy-kd-lr1e4

Updated 20 days ago

sergiopaniego/gemma-4-E2B-offpolicy-kd-lr2e4

Updated 20 days ago

View 143 models

datasets 14

sergiopaniego/math-sdpo-hints-plain

Viewer • Updated 16 days ago • 600 • 37

sergiopaniego/math-sdpo-hints

Viewer • Updated 16 days ago • 600 • 43

sergiopaniego/gsm8k-sdpo-plain

Viewer • Updated 16 days ago • 700 • 46

sergiopaniego/gsm8k-sdpo-hints

Viewer • Updated 16 days ago • 700 • 49

sergiopaniego/pi-mono-chat

Viewer • Updated 24 days ago • 886 • 120

sergiopaniego/requests-pr-diff

Viewer • Updated May 19 • 1 • 23

sergiopaniego/trl-r2e-test

Viewer • Updated May 18 • 1 • 37

sergiopaniego/chain-sum-rollouts

Viewer • Updated May 4 • 50 • 13

sergiopaniego/ttt-scripted-smoke

Viewer • Updated Apr 17 • 20 • 17

sergiopaniego/sample_videos

Viewer • Updated Jun 30, 2025 • 2 • 32

View 14 datasets