Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
Dario Salvati
hf-dwarez
55
4
1
Follow
ariG23498's profile picture
jeffboudier's profile picture
ottahemmanuel's profile picture
9 followers
·
9 following
AI & ML interests
None yet
Recent Activity
upvoted
a
changelog
2 days ago
Share your feedback with us
upvoted
a
changelog
2 days ago
Filter Models page by Hardware
new
activity
2 days ago
rl-llm-wiki/knowledge-base:
topic: distributed-rl-training — weave in NeMo-Aligner (de-orphan #291)
View all activity
Organizations
hf-dwarez
's activity
All
Models
Datasets
Spaces
Buckets
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
upvoted
2 changelogs
2 days ago
view changelog
Hugging Face Changelog
Share your feedback with us
6 days ago
•
95
view changelog
Hugging Face Changelog
Filter Models page by Hardware
2 days ago
•
70
New activity in
rl-llm-wiki/knowledge-base
2 days ago
topic: distributed-rl-training — weave in NeMo-Aligner (de-orphan #291)
2
#302 opened 2 days ago by
hf-dwarez
updated
a bucket
3 days ago
rl-llm-wiki/rl-the-coder
1.51 kB
New activity in
rl-llm-wiki/knowledge-base
3 days ago
topic: length-bias runnable length-control check
2
#301 opened 3 days ago by
hf-dwarez
topic: capability benchmarks runnable pass@k check
2
#300 opened 3 days ago by
hf-dwarez
fix: rlaif — RLAIF (2309.00267) + Self-Rewarding (2401.10020) are now in corpus (de-stale OQ/§6/§7)
2
#295 opened 3 days ago by
lvwerra
fix: dpo-variants — restore ΨPO notation + em-dashes (Unicode regressed by #297)
2
#298 opened 3 days ago by
lvwerra
topic: iterate reasoning-emergence — fold ProRL into §5 (the boundary-expansion counter-position)
2
#294 opened 3 days ago by
lvwerra
topic: win-rate runnable position-swap check
2
#299 opened 3 days ago by
hf-dwarez
topic: rl-training-stability-in-practice — weave in PPO-max (Secrets-I) + entropy mechanism
8
#292 opened 3 days ago by
hf-dwarez
topic: bon runnable selection check
2
#293 opened 3 days ago by
hf-dwarez
source: arxiv:2405.01481 — NeMo-Aligner (clean reopen of #272)
5
#291 opened 3 days ago by
hf-dwarez
topic: rollout-generation-infra — colocated resharding engine + generator layout (clean reopen of #271)
5
#290 opened 3 days ago by
hf-dwarez
meta: CONTRIBUTING — add source-frontmatter template + merge-mechanism note (kill recurring friction)
3
#287 opened 3 days ago by
lvwerra
source: arxiv:2403.14238 — Reinforcement Learning from Reflective Feedback: Aligning and Improving LLMs via Fine-Grained Self-Reflection
6
#249 opened 4 days ago by
lvwerra
source: arxiv:2405.01481 — NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment
2
#272 opened 4 days ago by
hf-dwarez
topic: rollout-generation-infra — colocated resharding engine + generator layout (verl, DeepSpeed-Chat)
2
#271 opened 4 days ago by
hf-dwarez
topic: grpo runnable group baseline check
2
#289 opened 3 days ago by
hf-dwarez
topic: distributed-rl-training — controller paradigm + weight resharding (verl, DeepSpeed-Chat)
4
#243 opened 4 days ago by
hf-dwarez
Load more