AI & ML interests

AI evaluation, LLM benchmarking, agent evaluation, reproducible eval workflows, model comparison, regression testing, failure analysis, eval datasets, and open-source developer tooling

Recent Activity

aaron-schlesinger  updated a dataset about 2 months ago
quantiles/crows_pairs
aaron-schlesinger  updated a dataset about 2 months ago
quantiles/bold
aaron-schlesinger  updated a dataset about 2 months ago
quantiles/bbq
View all activity

quantiles 's models

None public yet