AI & ML interests
AI evaluation, LLM benchmarking, agent evaluation, reproducible eval workflows, model comparison, regression testing, failure analysis, eval datasets, and open-source developer tooling
Recent Activity
quantiles 's models
None public yet