Wenbo Zhang's picture

1

Wenbo Zhang

Wenboz

https://onepounchman.github.io/

AI & ML interests

Trustworthy AI, LLMs

Recent Activity

updated a model 1 day ago

Wenboz/SACD-Qwen2.5-3B-ALFWorld-k1-tau0.5-beta1.0-plain-pipeline

published a model 1 day ago

Wenboz/SACD-Qwen2.5-3B-ALFWorld-k1-tau0.5-beta1.0-plain-pipeline

updated a model 1 day ago

Wenboz/SACD-Qwen2.5-3B-ALFWorld-k1-tau0.75-beta1.0-plain-pipeline

View all activity

Organizations

None yet

models 23

Wenboz/SACD-Qwen2.5-3B-ALFWorld-k1-tau0.5-beta1.0-plain-pipeline

Reinforcement Learning • 3B • Updated 1 day ago

Wenboz/SACD-Qwen2.5-3B-ALFWorld-k1-tau0.75-beta1.0-plain-pipeline

Reinforcement Learning • 3B • Updated 1 day ago • 1

Wenboz/TCOD-v1-OPD-Qwen2.5-3B-WebShop

Text Generation • 3B • Updated 2 days ago • 13

Wenboz/TCOD-v1-OPD-Qwen2.5-3B-ALFWorld

Text Generation • 3B • Updated 2 days ago • 13

Wenboz/Qwen3-8B-trivia-RLMR-v2

8B • Updated 8 days ago • 7

Wenboz/Qwen3-8B-trivia-RLMR-v1

8B • Updated 8 days ago • 57

Wenboz/Qwen3-8B-trivia-RLVR-cot

8B • Updated 8 days ago • 81

Wenboz/mistral-7b-base-p3o

Updated Dec 27, 2024

Wenboz/zephyr-7b-dpo-full

Text Generation • 1B • Updated Dec 23, 2024 • 2

Wenboz/zephyr-7b-dpo-lora

Updated Oct 20, 2024 • 3

datasets 22

Wenboz/mistral-base-dpo-iter2-reward-logps-ultrafeedback

Viewer • Updated Nov 27, 2025 • 20.6k • 80

Wenboz/mistral-base-dpo-iter1-reward-logps-ultrafeedback

Viewer • Updated Nov 27, 2025 • 20.6k • 8

Wenboz/rm_r1_example

Viewer • Updated Jul 7, 2025 • 1k • 12

Wenboz/ultrafeedback_rationale_Qwen2.5-3B-Instruct_cot_v3

Viewer • Updated May 24, 2025 • 6 • 4

Wenboz/ultrafeedback_rationale_Qwen2.5-3B-Instruct_ultra_filter_2e-5_thre-0.8_packing_42_cot

Updated Mar 3, 2025 • 3

Wenboz/ultrafeedback_rationale_Qwen2.5-3B-Instruct_ultra_sft_2e-5_thre-0.7_packing_42_cot

Viewer • Updated Mar 1, 2025 • 63.1k • 10

Wenboz/ultrafeedback_rationale_gemma-2-2b-it_cot

Viewer • Updated Feb 21, 2025 • 10 • 7

Wenboz/ultrafeedback_rationale_Qwen2.5-3B-Instruct_cot

Viewer • Updated Feb 21, 2025 • 63.1k • 30

Wenboz/ultrafeedback_rationale_Qwen2.5-3B-Instruct_direct

Viewer • Updated Feb 20, 2025 • 61.1k • 13

Wenboz/ultrafeedback_rationale_Llama-3.2-3B-Instruct_cot

Viewer • Updated Feb 20, 2025 • 61.1k • 7

View 22 datasets