LAPVQA — VQA (Native / End-to-end)

Description

VQA task heads trained with end-to-end fine-tuning (encoder + head jointly). Provides a baseline for comparison with the frozen-encoder variant lapvqa-vqa. Each .pt file is a plain state dict of VQAHead.

File	Encoder	vis_dim
`clip-vit-l14_best.pt`	CLIP ViT-L/14 (fine-tuned)	1024
`siglip_best.pt`	SigLIP (fine-tuned)	1152
`florence2_best.pt`	Florence-2 (fine-tuned)	1024
`coca_best.pt`	CoCa (fine-tuned)	768
`mae-vit-l16_best.pt`	MAE ViT-L/16 (fine-tuned)	1024

Loading

import torch
from lapvqa.vqa.model import VQAHead

VIS_DIMS = {
    "clip-vit-l14": 1024, "siglip": 1152,
    "florence2": 1024, "coca": 768, "mae-vit-l16": 1024,
}
encoder = "siglip"
ckpt = torch.load(f"{encoder}_best.pt", map_location="cpu")
head = VQAHead(vis_dim=VIS_DIMS[encoder])
head.load_state_dict(ckpt)
head.eval()

Downloads last month: -; Downloads are not tracked for this model. How to track

Collection including dmusingu/lapvqa-vqa-native

LAPVQA

Collection

Chest X-ray models: pre-trained encoders and task heads for VQA, DiffVQA, RRG, detection, and grounding on MIMIC-CXR. • 14 items • Updated Jun 5