Joseph Robert Turcotte PRO
Fishtiks
AI & ML interests
Roleplaying, lorabration, abliteration, smol models, extensive filtering, unusual datasets, home usage, HPCs for AI, distributed training/federated learning, and sentience.
AI should find and label AI hallucinations with GANs so we can give them context and use.
Recent Activity
liked a model 1 day ago
sensenova/SenseNova-U1-8B-MoT-Infographic liked a Space 1 day ago
nvidia/nemotron-speech-streaming-en-0.6bOrganizations
reacted to FlameF0X's post with π₯ 1 day ago
reacted to anakin87's post with β€οΈπ about 2 months ago
Post
4183
π Let LLMs wander - Engineering RL Environments
Reinforcement Learning Environments are little worlds
where models can act, get rewards, and learn.
I've been exploring how to design them, figuring out what works and what doesn't.
If you want to learn how to build them, I recorded a practical intro video.
You'll also see how to turn Liquid AI LFM2-2.6B into a Tic-tac-toe master π
π₯ Engineering RL Environments video: https://www.youtube.com/watch?v=71V3fTaUp2Q
---
π± LLM RL Environments Lil Course: https://github.com/anakin87/llm-rl-environments-lil-course
π€πΉοΈ Play against the trained model: anakin87/LFM2-2.6B-mr-tictactoe
π HF collection (datasets + models): https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe
Reinforcement Learning Environments are little worlds
where models can act, get rewards, and learn.
I've been exploring how to design them, figuring out what works and what doesn't.
If you want to learn how to build them, I recorded a practical intro video.
You'll also see how to turn Liquid AI LFM2-2.6B into a Tic-tac-toe master π
π₯ Engineering RL Environments video: https://www.youtube.com/watch?v=71V3fTaUp2Q
---
π± LLM RL Environments Lil Course: https://github.com/anakin87/llm-rl-environments-lil-course
π€πΉοΈ Play against the trained model: anakin87/LFM2-2.6B-mr-tictactoe
π HF collection (datasets + models): https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe
reacted to JonnaMat's post with π about 2 months ago
Post
1255
β‘ FlashHead: Fast LM Head Inference - Now a Simple vLLM Plugin
flash-head replaces the dense LM head with a two-stage retrieval pipeline - up to 2x inference speedup, training-free. Previously required custom Docker images; now it's just:
β¨ The plugin activates automatically via vLLM's
π§© Supported models (full collection):
Qwen Qwen3,
meta-llama Llama3,
google Gemma3,
nvidia Cosmos-Reason2 - BF16 and W4A16 variants.
https://huggingface.co/collections/embedl/flashhead
π embedl/Edge-Inference-Benchmarks
π§ Benchmark it yourself:
FlashHead shines at low batch sizes; the typical real-time / on-device use case. π
flash-head replaces the dense LM head with a two-stage retrieval pipeline - up to 2x inference speedup, training-free. Previously required custom Docker images; now it's just:
pip install flash-head
vllm serve embedl/Qwen3-1.7B-FlashHead-W4A16β¨ The plugin activates automatically via vLLM's
vllm.general_plugins entry point. No source patches, no custom imports.π§© Supported models (full collection):
https://huggingface.co/collections/embedl/flashhead
π embedl/Edge-Inference-Benchmarks
π§ Benchmark it yourself:
vllm bench latency --model embedl/Qwen3-1.7B-FlashHead-W4A16 --batch-size 1
# Baseline comparison
FLASHHEAD_ENABLED=0 vllm bench latency --model embedl/Qwen3-1.7B-FlashHead-W4A16 --batch-size 1FlashHead shines at low batch sizes; the typical real-time / on-device use case. π
posted an update 3 months ago
Post
525
When I was a child, I had a lot of stuffed animals. I say child, but I played with stuffed animals up until I was 15, and only stopped because others said it was weird. I made personalities for them. I could have made "fan art" or something of that nature, but it existed in my imagination, and sometimes, I'd sketch it. I also played with ALICE, which came naturally to me, then.
Well, it turns out that this is all highly autistic stuff, including playing with toys and stories long later than other children. It's also fascinating to me that these are the qualities which, in my opinion, make deeply autistic individuals great clickworkers/trainers in AI. They realize they're curating a personality, partially as an escape from real people and their cruelty, and are okay for that. A lot of autistic will end up needing AI, and that's okay, because it's better to have something and need it than to need it and not have it available. I hope that as AI improves accessibility features, its benefits are considered alongside costs, to provide more functional AI wherever possible, if cheap and energy-efficient enough.
I hope people don't lose their desires to develop their own skills because of AI. I'm not that good of a drawer, and never will be, but I'd hate to see someone just never try because AI is so good. But at the same time, being a ghostwriter, I believe everyone deserves that sort of creative power, and am proud to be involved in bringing it to them. I'm proud to be involved in replacing myself, because I want AI to write better than I do, so one day, you can describe your perfect show, and simply watch it. Some people say that world is horrific. I see it more like when we finally got to stream a large selection of movies rather than just a few cable or satellite selections that were super expensive.
Well, it turns out that this is all highly autistic stuff, including playing with toys and stories long later than other children. It's also fascinating to me that these are the qualities which, in my opinion, make deeply autistic individuals great clickworkers/trainers in AI. They realize they're curating a personality, partially as an escape from real people and their cruelty, and are okay for that. A lot of autistic will end up needing AI, and that's okay, because it's better to have something and need it than to need it and not have it available. I hope that as AI improves accessibility features, its benefits are considered alongside costs, to provide more functional AI wherever possible, if cheap and energy-efficient enough.
I hope people don't lose their desires to develop their own skills because of AI. I'm not that good of a drawer, and never will be, but I'd hate to see someone just never try because AI is so good. But at the same time, being a ghostwriter, I believe everyone deserves that sort of creative power, and am proud to be involved in bringing it to them. I'm proud to be involved in replacing myself, because I want AI to write better than I do, so one day, you can describe your perfect show, and simply watch it. Some people say that world is horrific. I see it more like when we finally got to stream a large selection of movies rather than just a few cable or satellite selections that were super expensive.
reacted to kostakoff's post with ππ 3 months ago
Post
2216
Mining GPU Nvidia CMP 170HX - let's run some models!
To satisfy my curiosity, I investigated different GPUs and found this: a mining version of the A100 β the CMP 170HX.
It is a very interesting GPU. Based on public documentation, it has hardware similar to the datacenter A100. If you open it up and look at the board, you will see that it's very similar to an A100 board; it even has NVLink connectors.
Online, I found almost no information about how to run it, whether it works with LLMs, or if it's supported by default Nvidia drivers and CUDA. So, I decided to test it myself.
I installed it in my lab (see previous post https://huggingface.co/posts/kostakoff/584269728210158) and found that the default nvidia-driver-570 works with it out of the box. After that, I checked if CUDA was available, and it worked too.
The next step was to try running some models:
- Stable Diffusion XL with BNB4 quantization: It took around two minutes to generate an image, but it works!
- Compiled llama.cpp for CUDA (https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#compilation): I run Mistral 7B Q4_K_M, and this actually worked even better. It was able to generate 33 tokens per second and read 400 tokens per second.
There are some limitations related to power utilization:
- When running PyTorch, it doesn't utilize more than 80 watts.
- When running llama.cpp, utilization is a bit better but still limited to 113 watts.
I found this GitHub thread about the Nvidia CMP https://github.com/dartraiden/NVIDIA-patcher/issues/73, and it looks like this mining GPU has an internal rate limiter based on FMA compute calls. I haven't found a solution to bypass it yet.
llmlaba
To satisfy my curiosity, I investigated different GPUs and found this: a mining version of the A100 β the CMP 170HX.
It is a very interesting GPU. Based on public documentation, it has hardware similar to the datacenter A100. If you open it up and look at the board, you will see that it's very similar to an A100 board; it even has NVLink connectors.
Online, I found almost no information about how to run it, whether it works with LLMs, or if it's supported by default Nvidia drivers and CUDA. So, I decided to test it myself.
I installed it in my lab (see previous post https://huggingface.co/posts/kostakoff/584269728210158) and found that the default nvidia-driver-570 works with it out of the box. After that, I checked if CUDA was available, and it worked too.
The next step was to try running some models:
- Stable Diffusion XL with BNB4 quantization: It took around two minutes to generate an image, but it works!
- Compiled llama.cpp for CUDA (https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#compilation): I run Mistral 7B Q4_K_M, and this actually worked even better. It was able to generate 33 tokens per second and read 400 tokens per second.
There are some limitations related to power utilization:
- When running PyTorch, it doesn't utilize more than 80 watts.
- When running llama.cpp, utilization is a bit better but still limited to 113 watts.
I found this GitHub thread about the Nvidia CMP https://github.com/dartraiden/NVIDIA-patcher/issues/73, and it looks like this mining GPU has an internal rate limiter based on FMA compute calls. I haven't found a solution to bypass it yet.
Love it. I've been looking at these, and am glad there are some trials going.
reacted to MikeDoes's post with ππ₯ 3 months ago
Post
4592
At Ai4Privacy, our goal is to empower researchers to build a safer AI ecosystem. Today, we're highlighting crucial research that does just that by exposing a new vulnerability.
The paper "Forget to Flourish" details a new model poisoning technique. It's a reminder that as we fine-tune LLMs, our anonymization and privacy strategies must evolve to counter increasingly sophisticated threats.
We're proud that the Ai4Privacy dataset was instrumental in this study. It served two key purposes:
Provided a Realistic Testbed: It gave the researchers access to a diverse set of synthetic and realistic PII samples in a safe, controlled environment.
Enabled Impactful Benchmarking: It allowed them to measure the actual effectiveness of their data extraction attack, proving it could compromise specific, high-value information.
This work reinforces our belief that progress in AI security is a community effort. By providing robust tools for benchmarking, we can collectively identify weaknesses and build stronger, more resilient systems. A huge congratulations to the authors on this important contribution.
π Read the full paper: https://arxiv.org/html/2408.17354v1
π Stay updated on the latest in privacy-preserving AIβfollow us on LinkedIn: https://www.linkedin.com/company/ai4privacy/posts/
#OpenSource #DataPrivacy #LLM #Anonymization #AIsecurity #HuggingFace #Ai4Privacy #Worldslargestopensourceprivacymaskingdataset
The paper "Forget to Flourish" details a new model poisoning technique. It's a reminder that as we fine-tune LLMs, our anonymization and privacy strategies must evolve to counter increasingly sophisticated threats.
We're proud that the Ai4Privacy dataset was instrumental in this study. It served two key purposes:
Provided a Realistic Testbed: It gave the researchers access to a diverse set of synthetic and realistic PII samples in a safe, controlled environment.
Enabled Impactful Benchmarking: It allowed them to measure the actual effectiveness of their data extraction attack, proving it could compromise specific, high-value information.
This work reinforces our belief that progress in AI security is a community effort. By providing robust tools for benchmarking, we can collectively identify weaknesses and build stronger, more resilient systems. A huge congratulations to the authors on this important contribution.
π Read the full paper: https://arxiv.org/html/2408.17354v1
π Stay updated on the latest in privacy-preserving AIβfollow us on LinkedIn: https://www.linkedin.com/company/ai4privacy/posts/
#OpenSource #DataPrivacy #LLM #Anonymization #AIsecurity #HuggingFace #Ai4Privacy #Worldslargestopensourceprivacymaskingdataset
reacted to marksverdhei's post with π₯ 3 months ago
Post
1763
# The most underrated feature of Qwen3-TTS: Voice embeddings! π§βπ¦°π¬
https://huggingface.co/collections/marksverdhei/qwen3-voice-embedding
Did you know that Qwen3 TTS actually utilizes voice embedding?
Your voice is turned into a vector of 1024 (or 2048) dimensions,
and based on this vector alone you can get your custom voice.
But the coolest part is that this means that you can use math to modify voices, average voices. You can swap gender, pitch, mix and match vocies, and even create an emotion space! This also enables semantic voice search!
The voice embedding model is actually just a tiny encoder with just a few million parameters. I've ripped it out of the voice embeding model so you can use the embedding model standalone. Check out my collection! :D
https://huggingface.co/collections/marksverdhei/qwen3-voice-embedding
Did you know that Qwen3 TTS actually utilizes voice embedding?
Your voice is turned into a vector of 1024 (or 2048) dimensions,
and based on this vector alone you can get your custom voice.
But the coolest part is that this means that you can use math to modify voices, average voices. You can swap gender, pitch, mix and match vocies, and even create an emotion space! This also enables semantic voice search!
The voice embedding model is actually just a tiny encoder with just a few million parameters. I've ripped it out of the voice embeding model so you can use the embedding model standalone. Check out my collection! :D
reacted to Tonic's post with π₯ 3 months ago
Post
3765
π€ Who would win ?
- a fully subsidized ai lab
- 3 random students named
kurakurai ?
demo : Tonic/fr-on-device
if you like it give the demo a little star and send a shoutout to : @MaxLSB @jddqd and @GAD-cell for absolutely obliterating the pareto frontier of the french language understanding .
- a fully subsidized ai lab
OR - 3 random students named
demo : Tonic/fr-on-device
if you like it give the demo a little star and send a shoutout to : @MaxLSB @jddqd and @GAD-cell for absolutely obliterating the pareto frontier of the french language understanding .
reacted to SeaWolf-AI's post with π₯π 3 months ago
Post
4326
FINAL Bench Released: The Real Bottleneck to AGI Is Self-Correction
We release FINAL Bench, the first benchmark for measuring functional metacognition in LLMs β the ability to detect and correct one's own reasoning errors. Every existing benchmark measures final-answer accuracy. None measures whether AI knows it is wrong.
Dataset: [FINAL-Bench/Metacognitive]( FINAL-Bench/Metacognitive) | 100 Tasks | 15 Domains | 8 TICOS Types | Apache 2.0
Leaderboard: FINAL-Bench/Leaderboard
Article: https://huggingface.co/blog/FINAL-Bench/metacognitive
Core Innovation
Our 5-axis rubric separates what no prior benchmark could: MA (Metacognitive Accuracy) β the ability to say "I might be wrong", and ER (Error Recovery) β the ability to actually fix it. This maps directly to the monitoring-control model of Nelson & Narens (1990) in cognitive psychology.
Three Findings Across 9 SOTA Models
We evaluated GPT-5.2, Claude Opus 4.6, Gemini 3 Pro, DeepSeek-V3.2, Kimi K2.5, and others across 100 expert-level tasks:
1. ER Dominance. 94.8% of MetaCog gain comes from Error Recovery alone. The bottleneck to AGI is not knowledge or reasoning β it is self-correction.
2. Declarative-Procedural Gap. All 9 models can verbalize uncertainty (MA = 0.694) but cannot act on it (ER = 0.302). They sound humble but fail to self-correct β the most dangerous AI safety profile.
3. Difficulty Effect. Harder tasks benefit dramatically more from metacognition (Pearson r = -0.777, p < 0.001).
Paper: FINAL Bench: Measuring Functional Metacognitive Reasoning in LLMs
FINAL Bench is the first tool to tell apart what AI truly knows from what it merely pretends to know.
We release FINAL Bench, the first benchmark for measuring functional metacognition in LLMs β the ability to detect and correct one's own reasoning errors. Every existing benchmark measures final-answer accuracy. None measures whether AI knows it is wrong.
Dataset: [FINAL-Bench/Metacognitive]( FINAL-Bench/Metacognitive) | 100 Tasks | 15 Domains | 8 TICOS Types | Apache 2.0
Leaderboard: FINAL-Bench/Leaderboard
Article: https://huggingface.co/blog/FINAL-Bench/metacognitive
Core Innovation
Our 5-axis rubric separates what no prior benchmark could: MA (Metacognitive Accuracy) β the ability to say "I might be wrong", and ER (Error Recovery) β the ability to actually fix it. This maps directly to the monitoring-control model of Nelson & Narens (1990) in cognitive psychology.
Three Findings Across 9 SOTA Models
We evaluated GPT-5.2, Claude Opus 4.6, Gemini 3 Pro, DeepSeek-V3.2, Kimi K2.5, and others across 100 expert-level tasks:
1. ER Dominance. 94.8% of MetaCog gain comes from Error Recovery alone. The bottleneck to AGI is not knowledge or reasoning β it is self-correction.
2. Declarative-Procedural Gap. All 9 models can verbalize uncertainty (MA = 0.694) but cannot act on it (ER = 0.302). They sound humble but fail to self-correct β the most dangerous AI safety profile.
3. Difficulty Effect. Harder tasks benefit dramatically more from metacognition (Pearson r = -0.777, p < 0.001).
from datasets import load_dataset
dataset = load_dataset("FINAL-Bench/Metacognitive", split="train")Paper: FINAL Bench: Measuring Functional Metacognitive Reasoning in LLMs
FINAL Bench is the first tool to tell apart what AI truly knows from what it merely pretends to know.
reacted to MikeDoes's post with π 4 months ago
Post
5442
Can you teach a giant like Google's Gemini to protect user privacy? A new step-by-step guide shows that the answer is a resounding "yes."
While powerful, large language models aren't specialized for privacy tasks. This tutorial by Analytics Vidhya walks through how to fine-tune Gemini into a dedicated tool for PII anonymization.
To teach the model this critical skill, the author needed a robust dataset with thousands of clear 'before' and 'after' examples.
We're thrilled they chose the Ai4Privacy pii-masking-200k dataset for this task. Our data provided the high-quality, paired examples of masked and unmasked text necessary to effectively train Gemini to identify and hide sensitive information accurately.
This is a perfect example of how the community can use open-source data to add a crucial layer of safety to the world's most powerful models. Great work!
π Check out the full tutorial here: https://www.analyticsvidhya.com/blog/2024/03/guide-to-fine-tuning-gemini-for-masking-pii-data/
π Stay updated on the latest in privacy-preserving AIβfollow us on LinkedIn: https://www.linkedin.com/company/ai4privacy/posts/
#DataPrivacy #AI #LLM #FineTuning #Anonymization #GoogleGemini #Ai4Privacy #World's largest open privacy masking dataset
While powerful, large language models aren't specialized for privacy tasks. This tutorial by Analytics Vidhya walks through how to fine-tune Gemini into a dedicated tool for PII anonymization.
To teach the model this critical skill, the author needed a robust dataset with thousands of clear 'before' and 'after' examples.
We're thrilled they chose the Ai4Privacy pii-masking-200k dataset for this task. Our data provided the high-quality, paired examples of masked and unmasked text necessary to effectively train Gemini to identify and hide sensitive information accurately.
This is a perfect example of how the community can use open-source data to add a crucial layer of safety to the world's most powerful models. Great work!
π Check out the full tutorial here: https://www.analyticsvidhya.com/blog/2024/03/guide-to-fine-tuning-gemini-for-masking-pii-data/
π Stay updated on the latest in privacy-preserving AIβfollow us on LinkedIn: https://www.linkedin.com/company/ai4privacy/posts/
#DataPrivacy #AI #LLM #FineTuning #Anonymization #GoogleGemini #Ai4Privacy #World's largest open privacy masking dataset
reacted to mitkox's post with π 4 months ago
Post
4848
I just pushed Claude Code Agent Swarm with 20 coding agents on my desktop GPU workstation.
With local AI, I donβt have /fast CC switch, but I have /absurdlyfast:
- 100β499 tokens/second read, yeah 100k, not a typo | 811 tok/sec generation
- KV cache: 707β200 tokens
- Hardware: 5+ year old GPUs 4xA6K gen1; Itβs not the car. Itβs the driver.
Qwen3 Coder Next AWQ with cache at BF16. Scores 82.1% in C# on 29-years-in-dev codebase vs Opus 4.5 at only 57.5%. When your codebase predates Stack Overflow, you don't need the biggest model; you need the one that actually remembers Windows 95.
My current bottleneck is my 27" monitor. Can't fit all 20 Theos on screen without squinting.
With local AI, I donβt have /fast CC switch, but I have /absurdlyfast:
- 100β499 tokens/second read, yeah 100k, not a typo | 811 tok/sec generation
- KV cache: 707β200 tokens
- Hardware: 5+ year old GPUs 4xA6K gen1; Itβs not the car. Itβs the driver.
Qwen3 Coder Next AWQ with cache at BF16. Scores 82.1% in C# on 29-years-in-dev codebase vs Opus 4.5 at only 57.5%. When your codebase predates Stack Overflow, you don't need the biggest model; you need the one that actually remembers Windows 95.
My current bottleneck is my 27" monitor. Can't fit all 20 Theos on screen without squinting.
reacted to MikeDoes's post with π₯π 4 months ago
Post
3726
You don't need a massive research lab to build a privacy-preserving AI tool thanks to open datasets. With the right ingredients, anyone can.
A fantastic new guide shows how the democratization of AI is helping to advance safety. It walks through how to use Google's new fine-tuning API to turn Gemini into a powerful tool for PII anonymization.
This project was powered by two key components:
An accessible platform from Google.
High-quality, open-source training data.
We are honored that the author chose the Ai4Privacy pii-masking-200k dataset to provide the crucial data foundation. Our dataset delivered the volume and structure needed to successfully teach a state-of-the-art model how to perform a critical privacy function.
This is the future we're working towards: powerful platforms combined with open, safety-focused data to create tools that benefit everyone. Kudos to the author for showcasing what's possible!
π Read the full step-by-step guide: https://www.analyticsvidhya.com/blog/2024/03/guide-to-fine-tuning-gemini-for-masking-pii-data/
π Stay updated on the latest in privacy-preserving AIβfollow us on LinkedIn: https://www.linkedin.com/company/ai4privacy/posts/
#AIforGood #DemocratizeAI #DataPrivacy #Anonymization #OpenSource #LLM #Ai4Privacy
A fantastic new guide shows how the democratization of AI is helping to advance safety. It walks through how to use Google's new fine-tuning API to turn Gemini into a powerful tool for PII anonymization.
This project was powered by two key components:
An accessible platform from Google.
High-quality, open-source training data.
We are honored that the author chose the Ai4Privacy pii-masking-200k dataset to provide the crucial data foundation. Our dataset delivered the volume and structure needed to successfully teach a state-of-the-art model how to perform a critical privacy function.
This is the future we're working towards: powerful platforms combined with open, safety-focused data to create tools that benefit everyone. Kudos to the author for showcasing what's possible!
π Read the full step-by-step guide: https://www.analyticsvidhya.com/blog/2024/03/guide-to-fine-tuning-gemini-for-masking-pii-data/
π Stay updated on the latest in privacy-preserving AIβfollow us on LinkedIn: https://www.linkedin.com/company/ai4privacy/posts/
#AIforGood #DemocratizeAI #DataPrivacy #Anonymization #OpenSource #LLM #Ai4Privacy
reacted to melvindave's post with π₯ 4 months ago
Post
2425
I made my own avatar banner maker
https://avatar.donvitocodes.com/
Using Claude Code and Opus 4.6 in a day
I use it in my HF profile too
https://avatar.donvitocodes.com/
Using Claude Code and Opus 4.6 in a day
I use it in my HF profile too
reacted to marksverdhei's post with π€ 4 months ago
Post
2704
Dear Hugging Face team, can we please have a way to archive hf repositories / spaces? I have a bunch of spaces that used to work but don't any more due to the hf space implementations changing and i think it would be good if I could archive those like in GitHub.
React to this post if you want to see this feature! π‘
React to this post if you want to see this feature! π‘