gemma-4-E4B-it β€” text-finetuned (QLoRA) β€” GGUF

GGUF export of a QLoRA fine-tune of gemma-4-E4B-it, quantized to Q4_K_M for llama.cpp. The fine-tune targets text generation (German medical domain); the multimodal audio (and vision) capabilities are inherited unchanged from the base model.

Files

File Size Description
gemma-4-E4B-it-ckpt500-Q4_K_M.gguf 5.0 GB Language model. Base gemma-4-E4B-it with the text-generation QLoRA adapter (checkpoint-500) merged in, quantized to Q4_K_M.
mmproj-gemma-4-E4B-it-BF16.gguf 946 MB Multimodal projector (BF16) β€” audio + vision encoders. Required for audio/image input. Not fine-tuned.

⚠️ Chat template / system prompt (important)

The base gemma-4-E4B-it is a reasoning model: its original chat template auto-enables a <channel>thought reasoning trace whenever the first message has role system or developer. For a JSON-only extraction task this looks like "gibberish" (the model emits a thinking trace instead of the JSON). This GGUF ships a corrected chat template that folds any system/developer message into the first user turn and never enables thinking β€” so the model answers directly. Just deploy normally with --jinja:

# Text + audio
llama-server -m gemma-4-E4B-it-ckpt500-Q4_K_M.gguf \
             --mmproj mmproj-gemma-4-E4B-it-BF16.gguf --jinja

# Text only
llama-cli -m gemma-4-E4B-it-ckpt500-Q4_K_M.gguf --jinja -st -sysf system.txt -f user.txt

system + user chat-completions requests now return the trained JSON directly. (If you ever use the base model's template instead, send the instructions in the user message β€” do not use a system role β€” to avoid the reasoning trace.)

What is and isn't fine-tuned

  • βœ… Text generation β€” QLoRA adapter (rank 8, checkpoint-500) merged into the language model.
  • βž– Audio / vision β€” base mmproj encoders, unchanged.

Provenance

  • Base model: unsloth/gemma-4-E4B-it (16-bit).
  • Adapter: LoRA r=8, Ξ±=8, checkpoint-500.
  • Export: adapter β†’ GGUF (convert_lora_to_gguf.py) β†’ merged into base text GGUF (llama-export-lora) β†’ quantized Q4_K_M (llama-quantize) β†’ chat template corrected (gguf_new_metadata.py).
Downloads last month
56
GGUF
Model size
8B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Mediform/gemma4

Quantized
(13)
this model