Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
lzwjava
/
zz
like
1
Model card
Files
Files and versions
xet
Community
Copy to bucket
new
main
zz
/
scripts
/
extract
14.7 kB
Ctrl+K
Ctrl+K
3 contributors
History:
3 commits
lzwjava
Add deepseek run_lite and fineweb extract/tokenize scripts
108bc6b
11 days ago
extract_fineweb.py
Safe
2.67 kB
refactor: reorganize project structure
2 months ago
extract_fineweb_gpt3.py
Safe
3.15 kB
Add deepseek run_lite and fineweb extract/tokenize scripts
11 days ago
extract_parquet.py
Safe
1.6 kB
chore: add ruff pre-commit hooks and apply formatting
26 days ago
extract_wiki.py
Safe
308 Bytes
refactor: reorganize project structure
2 months ago
extract_wiki_corpus.py
Safe
454 Bytes
refactor: reorganize project structure
2 months ago
rename_fineweb.py
Safe
1.2 kB
refactor: reorganize project structure
2 months ago
tokenize_fineweb_gpt3.py
Safe
5.29 kB
Add deepseek run_lite and fineweb extract/tokenize scripts
11 days ago