LAPVQA
Collection
Chest X-ray models: pre-trained encoders and task heads for VQA, DiffVQA, RRG, detection, and grounding on MIMIC-CXR. โข 14 items โข Updated
Part of the LAPVQA collection.
VQA task heads trained with end-to-end fine-tuning (encoder + head jointly).
Provides a baseline for comparison with the frozen-encoder variant
lapvqa-vqa.
Each .pt file is a plain state dict of VQAHead.
| File | Encoder | vis_dim |
|---|---|---|
clip-vit-l14_best.pt |
CLIP ViT-L/14 (fine-tuned) | 1024 |
siglip_best.pt |
SigLIP (fine-tuned) | 1152 |
florence2_best.pt |
Florence-2 (fine-tuned) | 1024 |
coca_best.pt |
CoCa (fine-tuned) | 768 |
mae-vit-l16_best.pt |
MAE ViT-L/16 (fine-tuned) | 1024 |
import torch
from lapvqa.vqa.model import VQAHead
VIS_DIMS = {
"clip-vit-l14": 1024, "siglip": 1152,
"florence2": 1024, "coca": 768, "mae-vit-l16": 1024,
}
encoder = "siglip"
ckpt = torch.load(f"{encoder}_best.pt", map_location="cpu")
head = VQAHead(vis_dim=VIS_DIMS[encoder])
head.load_state_dict(ckpt)
head.eval()