LAPVQA โ€” Differential VQA (Native / End-to-end)

Part of the LAPVQA collection.

Description

DiffVQA models trained end-to-end (encoder + head jointly). Each .pt file is a plain state dict of DiffVQAHead. MAE-ViT-L/16 is the primary encoder studied.

Results (test set, MAE-ViT-L/16)

BLEU-4 ROUGE-2 RadGraph-s BERTScore F1
0.472 0.573 0.288 0.938
File Encoder vis_dim
clip-vit-l14_best.pt CLIP ViT-L/14 1024
coca_best.pt CoCa 768
florence2_best.pt Florence-2 1024
mae-vit-l16_best.pt MAE ViT-L/16 1024
siglip_best.pt SigLIP 1152

Loading

import torch
from lapvqa.diffvqa.model import DiffVQAHead

ckpt = torch.load("mae-vit-l16_best.pt", map_location="cpu")
head = DiffVQAHead(vis_dim=1024)
head.load_state_dict(ckpt)
head.eval()
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Collection including dmusingu/lapvqa-diffvqa-native