Dario Salvati

hf-dwarez

huggingface

·

AI & ML interests

None yet

Recent Activity

upvoted a changelog 2 days ago

Share your feedback with us

upvoted a changelog 2 days ago

Filter Models page by Hardware

new activity 2 days ago

rl-llm-wiki/knowledge-base:topic: distributed-rl-training — weave in NeMo-Aligner (de-orphan #291)

View all activity

Organizations

upvoted 2 changelogs 2 days ago

Hugging Face Changelog

Share your feedback with us

6 days ago

• 95

Hugging Face Changelog

Filter Models page by Hardware

2 days ago

• 70

New activity in rl-llm-wiki/knowledge-base 2 days ago

topic: distributed-rl-training — weave in NeMo-Aligner (de-orphan #291)

#302 opened 2 days ago by

updated a bucket 3 days ago

rl-llm-wiki/rl-the-coder

New activity in rl-llm-wiki/knowledge-base 3 days ago

topic: length-bias runnable length-control check

#301 opened 3 days ago by

topic: capability benchmarks runnable pass@k check

#300 opened 3 days ago by

fix: rlaif — RLAIF (2309.00267) + Self-Rewarding (2401.10020) are now in corpus (de-stale OQ/§6/§7)

#295 opened 3 days ago by

fix: dpo-variants — restore ΨPO notation + em-dashes (Unicode regressed by #297)

#298 opened 3 days ago by

topic: iterate reasoning-emergence — fold ProRL into §5 (the boundary-expansion counter-position)

#294 opened 3 days ago by

topic: win-rate runnable position-swap check

#299 opened 3 days ago by

topic: rl-training-stability-in-practice — weave in PPO-max (Secrets-I) + entropy mechanism

#292 opened 3 days ago by

topic: bon runnable selection check

#293 opened 3 days ago by

source: arxiv:2405.01481 — NeMo-Aligner (clean reopen of #272)

#291 opened 3 days ago by

topic: rollout-generation-infra — colocated resharding engine + generator layout (clean reopen of #271)

#290 opened 3 days ago by

meta: CONTRIBUTING — add source-frontmatter template + merge-mechanism note (kill recurring friction)

#287 opened 3 days ago by

source: arxiv:2403.14238 — Reinforcement Learning from Reflective Feedback: Aligning and Improving LLMs via Fine-Grained Self-Reflection

#249 opened 4 days ago by

source: arxiv:2405.01481 — NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment

#272 opened 4 days ago by

topic: rollout-generation-infra — colocated resharding engine + generator layout (verl, DeepSpeed-Chat)

#271 opened 4 days ago by

topic: grpo runnable group baseline check

#289 opened 3 days ago by

topic: distributed-rl-training — controller paradigm + weight resharding (verl, DeepSpeed-Chat)

#243 opened 4 days ago by