You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

QomSSLab/Anonymizer-v3

This repository hosts an XLM-RoBERTa token-classification head trained.

Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

model_id = "QomSSLab/Anonymizer-v3"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForTokenClassification.from_pretrained(model_id)
tagger = pipeline("token-classification", model=model, tokenizer=tokenizer, aggregation_strategy="simple")

text = "مثال از یک ورودی فارسی"
for entity in tagger(text):
    print(entity)

Labels

  • ACOUNT
  • ADDRESS
  • AMOUNT
  • DATE
  • DOCUMENT_ID
  • ID
  • JOB
  • O
  • ORG
  • ORG_BRANCH
  • PERSON
  • PHONEـNUMBER
  • PLATEـNUMBER

Metrics

Validation Metrics

  • Precision: 0.9337
  • Recall: 0.9488
  • F1: 0.9412
  • Accuracy: 0.9919

Per-label Breakdown

Label Precision Recall F1 Support
ACOUNT 1.0000 1.0000 1.0000 0
ADDRESS 0.9683 0.9934 0.9807 1970
AMOUNT 1.0000 0.9947 0.9973 188
DATE 1.0000 0.9734 0.9865 338
DOCUMENT_ID 0.9888 0.9902 0.9895 711
ID 0.9718 1.0000 0.9857 69
JOB 1.0000 0.4364 0.6076 55
O 0.9953 0.9952 0.9952 18610
ORG 0.8037 0.8866 0.8431 97
ORG_BRANCH 0.9833 0.9584 0.9707 553
PERSON 0.9978 0.9928 0.9953 1385
PHONEـNUMBER 1.0000 0.9944 0.9972 178
PLATEـNUMBER 0.0000 1.0000 0.0000 0
Downloads last month
19
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support