Christopher Schröder

cschroeder

1 15 20

https://github.com/webis-de/small-text

AI & ML interests

NLP, Active Learning, Text Representations, PyTorch

Recent Activity

new activity 26 days ago

cschroeder/modernbert-finewebedu-rope-rep2:Adding `safetensors` variant of this model

updated a model about 2 months ago

cschroeder/alibi-test

published a model about 2 months ago

cschroeder/alibi-test

View all activity

Organizations

New activity in cschroeder/modernbert-finewebedu-rope-rep2 26 days ago

Adding `safetensors` variant of this model

#1 opened 26 days ago by

SFconvertbot

updated a model about 2 months ago

cschroeder/alibi-test

Updated Jun 1 • 3

published a model about 2 months ago

cschroeder/alibi-test

Updated Jun 1 • 3

updated a model 5 months ago

cschroeder/modernbert-finewebedu-alibi-rep2

Updated Feb 25 • 5

published a model 5 months ago

cschroeder/modernbert-finewebedu-alibi-rep2

Updated Feb 25 • 5

updated a model 5 months ago

cschroeder/modernbert-finewebedu-rope-rep2

0.1B • Updated 26 days ago • 43

published a model 5 months ago

cschroeder/modernbert-finewebedu-rope-rep2

0.1B • Updated 26 days ago • 43

liked a dataset 6 months ago

coral-nlp/german-commons

Viewer • Updated Jan 22 • 71.6M • 1.57k • 38

upvoted a paper 9 months ago

The German Commons - 154 Billion Tokens of Openly Licensed Text for German Language Models

Paper • 2510.13996 • Published Oct 15, 2025 • 9

updated a model 11 months ago

small-text/tiny-distilroberta-base

102k • Updated Aug 24, 2025 • 4

published a model 11 months ago

small-text/tiny-distilroberta-base

102k • Updated Aug 24, 2025 • 4

updated a model over 1 year ago

small-text/word2vec-google-news-300

Updated Mar 30, 2025

published a model over 1 year ago

small-text/word2vec-google-news-300

Updated Mar 30, 2025

upvoted 2 papers over 1 year ago

EuroBERT: Scaling Multilingual Encoders for European Languages

Paper • 2503.05500 • Published Mar 7, 2025 • 81

NeoBERT: A Next-Generation BERT

Paper • 2502.19587 • Published Feb 26, 2025 • 39

posted an update over 1 year ago

Post

617

🔥 𝐅𝐢𝐧𝐚𝐥 𝐂𝐚𝐥𝐥 𝐚𝐧𝐝 𝐃𝐞𝐚𝐝𝐥𝐢𝐧𝐞 𝐄𝐱𝐭𝐞𝐧𝐬𝐢𝐨𝐧: Survey on Data Annotation and Active Learning

Short summary: We need your support for a web survey in which we investigate how recent advancements in natural language processing, particularly LLMs, have influenced the need for labeled data in supervised machine learning — with a focus on, but not limited to, active learning. See the original post for details.

➡️ Extended Deadline: January 26th, 2025.
Please consider participating or sharing our survey! (If you have any experience with supervised learning in natural language processing, you are eligible to participate in our survey.)

Survey: https://bildungsportal.sachsen.de/umfragen/limesurvey/index.php/538271

replied to their post over 1 year ago

Just a quick note: I will not again enter any ideological debates here.

First off, I think this is a non-issue regardless of which license we use. This is first and foremost a scientific study, and the dataset we’re producing is more of a byproduct—its main purpose is to help other researchers verify our findings. It seems like there might be some misconceptions about this dataset: Think of it as a table of answer codes. It is not a text dataset and therefore not interesting or useful for LLM training (or similar).

Second, we made this decision because the survey doesn’t have any funding and relies on people generously sharing their opinions (without compensation). Given the growing skepticism around data collection, we wanted to be especially careful not to discourage users from participating. Our primary goal is to conduct a study with a population as diverse as possible, and we did not want to lose potential participants who might be less inclined to give away their data without compensation.

posted an update over 1 year ago

Post

504

Here’s just one of the many exciting questions from our survey. If these topics resonate with you and you have experience working on supervised learning with text (i.e., supervised learning in Natural Language Processing), we warmly invite you to participate!

Survey: https://bildungsportal.sachsen.de/umfragen/limesurvey/index.php/538271
Estimated time required: 5–15 minutes
Deadline for participation: January 12, 2025

—

❤️ We’re seeking responses from across the globe! If you know 1–3 people who might qualify for this survey—particularly those in different regions—please share it with them. We’d really appreciate it!

#NLProc #ActiveLearning #ML

2 replies

posted an update over 1 year ago

Post

379

💡𝗟𝗼𝗼𝗸𝗶𝗻𝗴 𝗳𝗼𝗿 𝘀𝘂𝗽𝗽𝗼𝗿𝘁: 𝗛𝗮𝘃𝗲 𝘆𝗼𝘂 𝗲𝘃𝗲𝗿 𝗵𝗮𝗱 𝘁𝗼 𝗼𝘃𝗲𝗿𝗰𝗼𝗺𝗲 𝗮 𝗹𝗮𝗰𝗸 𝗼𝗳 𝗹𝗮𝗯𝗲𝗹𝗲𝗱 𝗱𝗮𝘁𝗮 𝘁𝗼 𝗱𝗲𝗮𝗹 𝘄𝗶𝘁𝗵 𝗮𝗻 𝗡𝗟𝗣 𝘁𝗮𝘀𝗸?

Are you working on Natural Language Processing tasks and have faced the challenge of a lack of labeled data before? 𝗪𝗲 𝗮𝗿𝗲 𝗰𝘂𝗿𝗿𝗲𝗻𝘁𝗹𝘆 𝗰𝗼𝗻𝗱𝘂𝗰𝘁𝗶𝗻𝗴 𝗮 𝘀𝘂𝗿𝘃𝗲𝘆 to explore the strategies used to address this bottleneck, especially in the context of recent advancements, including but not limited to large language models.

The survey is non-commercial and conducted solely for academic research purposes. The results will contribute to an open-access publication that also benefits the community.

👉 With only 5–15 minutes of your time, you would greatly help to investigate which strategies are used by the #NLP community to overcome a lack of labeled data.

❤️How you can help even more: If you know others working on supervised learning and NLP, please share this survey with them—we’d really appreciate it!

Survey: https://bildungsportal.sachsen.de/umfragen/limesurvey/index.php/538271
Estimated time required: 5–15 minutes
Deadline for participation: January 12, 2025

#NLP #ML

liked a Space over 1 year ago

FineWeb-c - Annotation

🌐

Launch Argilla for data labeling and annotation

Christopher Schröder

AI & ML interests

Recent Activity

Organizations

cschroeder's activity

Adding `safetensors` variant of this model

FineWeb-c - Annotation