LAPVQA — Differential VQA (Native / End-to-end)

Description

DiffVQA models trained end-to-end (encoder + head jointly). Each .pt file is a plain state dict of DiffVQAHead. MAE-ViT-L/16 is the primary encoder studied.

Results (test set, MAE-ViT-L/16)

BLEU-4	ROUGE-2	RadGraph-s	BERTScore F1
0.472	0.573	0.288	0.938

File	Encoder	vis_dim
`clip-vit-l14_best.pt`	CLIP ViT-L/14	1024
`coca_best.pt`	CoCa	768
`florence2_best.pt`	Florence-2	1024
`mae-vit-l16_best.pt`	MAE ViT-L/16	1024
`siglip_best.pt`	SigLIP	1152

Loading

import torch
from lapvqa.diffvqa.model import DiffVQAHead

ckpt = torch.load("mae-vit-l16_best.pt", map_location="cpu")
head = DiffVQAHead(vis_dim=1024)
head.load_state_dict(ckpt)
head.eval()

Downloads last month: -; Downloads are not tracked for this model. How to track

Collection including dmusingu/lapvqa-diffvqa-native

LAPVQA

Collection

Chest X-ray models: pre-trained encoders and task heads for VQA, DiffVQA, RRG, detection, and grounding on MIMIC-CXR. • 14 items • Updated Jun 5