Aman Singh Thakur's picture
1 2

Aman Singh Thakur

singh96aman

AI & ML interests

Responsible AI and Classical Machine Learning

Recent Activity

reacted to theirpost with ๐Ÿ”ฅ almost 2 years ago
๐—๐˜‚๐—ฑ๐—ด๐—ถ๐—ป๐—ด ๐˜๐—ต๐—ฒ ๐—๐˜‚๐—ฑ๐—ด๐—ฒ๐˜€: ๐—˜๐˜ƒ๐—ฎ๐—น๐˜‚๐—ฎ๐˜๐—ถ๐—ป๐—ด ๐—”๐—น๐—ถ๐—ด๐—ป๐—บ๐—ฒ๐—ป๐˜ ๐—ฎ๐—ป๐—ฑ ๐—ฉ๐˜‚๐—น๐—ป๐—ฒ๐—ฟ๐—ฎ๐—ฏ๐—ถ๐—น๐—ถ๐˜๐—ถ๐—ฒ๐˜€ ๐—ถ๐—ป ๐—Ÿ๐—Ÿ๐— ๐˜€-๐—ฎ๐˜€-๐—๐˜‚๐—ฑ๐—ด๐—ฒ๐˜€ https://huggingface.co/papers/2406.12624 ๐‚๐š๐ง ๐‹๐‹๐Œ๐ฌ ๐ฌ๐ž๐ซ๐ฏ๐ž ๐š๐ฌ ๐ซ๐ž๐ฅ๐ข๐š๐›๐ฅ๐ž ๐ฃ๐ฎ๐๐ ๐ž๐ฌ โš–๏ธ? We aim to identify the right metrics for evaluating Judge LLMs and understand their sensitivities to prompt guidelines, engineering, and specificity. With this paper, we want to raise caution โš ๏ธ to blindly using LLMs as human proxy. Blog - https://huggingface.co/blog/singh96aman/judgingthejudges Arxiv - https://arxiv.org/abs/2406.12624 Tweet - https://x.com/iamsingh96aman/status/1804148173008703509 @singh96aman @kartik727 @Srinik-1 @sankaranv @dieuwkehupkes
View all activity

Organizations

None yet