You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

QomSSLab/Anonymizer-v3

This repository hosts an XLM-RoBERTa token-classification head trained.

Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

model_id = "QomSSLab/Anonymizer-v3"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForTokenClassification.from_pretrained(model_id)
tagger = pipeline("token-classification", model=model, tokenizer=tokenizer, aggregation_strategy="simple")

text = "مثال از یک ورودی فارسی"
for entity in tagger(text):
    print(entity)

Labels

ACOUNT
ADDRESS
AMOUNT
DATE
DOCUMENT_ID
ID
JOB
O
ORG
ORG_BRANCH
PERSON
PHONEـNUMBER
PLATEـNUMBER

Metrics

Validation Metrics

Precision: 0.9337
Recall: 0.9488
F1: 0.9412
Accuracy: 0.9919

Per-label Breakdown

Label	Precision	Recall	F1	Support
ACOUNT	1.0000	1.0000	1.0000	0
ADDRESS	0.9683	0.9934	0.9807	1970
AMOUNT	1.0000	0.9947	0.9973	188
DATE	1.0000	0.9734	0.9865	338
DOCUMENT_ID	0.9888	0.9902	0.9895	711
ID	0.9718	1.0000	0.9857	69
JOB	1.0000	0.4364	0.6076	55
O	0.9953	0.9952	0.9952	18610
ORG	0.8037	0.8866	0.8431	97
ORG_BRANCH	0.9833	0.9584	0.9707	553
PERSON	0.9978	0.9928	0.9953	1385
PHONEـNUMBER	1.0000	0.9944	0.9972	178
PLATEـNUMBER	0.0000	1.0000	0.0000	0

Downloads last month: 19

Safetensors

Model size

0.6B params

Tensor type

F32