LAPVQA โ€” Differential VQA (Captioning-Pretrained Encoder)

Part of the LAPVQA collection.

Description

DiffVQA head trained on the frozen LAPVQA captioning-pretrained encoder (lapvqa-pretrain-captioning). Checkpoint is a plain DiffVQAHead state dict (vis_dim=1024).

Results (test set)

BLEU-4 ROUGE-2 RadGraph-s BERTScore F1
0.468 0.562 0.303 0.938

Loading

import torch
from lapvqa.diffvqa.model import DiffVQAHead

ckpt = torch.load("pretrain-captioning_best.pt", map_location="cpu")
head = DiffVQAHead(vis_dim=1024)
head.load_state_dict(ckpt)
head.eval()
# pair with encoder_final.pt from lapvqa-pretrain-captioning
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Collection including dmusingu/lapvqa-diffvqa-pretrain-captioning