LAPVQA
Collection
Chest X-ray models: pre-trained encoders and task heads for VQA, DiffVQA, RRG, detection, and grounding on MIMIC-CXR. โข 14 items โข Updated
Part of the LAPVQA collection.
DiffVQA models trained end-to-end (encoder + head jointly). Each .pt file
is a plain state dict of DiffVQAHead. MAE-ViT-L/16 is the primary encoder studied.
| BLEU-4 | ROUGE-2 | RadGraph-s | BERTScore F1 |
|---|---|---|---|
| 0.472 | 0.573 | 0.288 | 0.938 |
| File | Encoder | vis_dim |
|---|---|---|
clip-vit-l14_best.pt |
CLIP ViT-L/14 | 1024 |
coca_best.pt |
CoCa | 768 |
florence2_best.pt |
Florence-2 | 1024 |
mae-vit-l16_best.pt |
MAE ViT-L/16 | 1024 |
siglip_best.pt |
SigLIP | 1152 |
import torch
from lapvqa.diffvqa.model import DiffVQAHead
ckpt = torch.load("mae-vit-l16_best.pt", map_location="cpu")
head = DiffVQAHead(vis_dim=1024)
head.load_state_dict(ckpt)
head.eval()