hzxie/DOM
Updated โข 30.3k โข 9
A DynamicVLA policy trained on the DOM dataset (hzxie/DOM) for dynamic-object manipulation.
โ ๏ธ Mid-training checkpoint (epoch 9, loss โ 0.002). Self-contained and eval-ready โ includes normalization buffers โ but optimizer/scheduler state is not included (cannot resume optimizer momentum from this file).
SmolLM2-360M VLM backbone (16 layers) + FastViT vision encoderfreeze_* = False) โ all 430M parameters are trainable (the stock config freezes the
backbone and trains only ~99M; this run trains everything).opst_cam + wrist_cam.Use the DynamicVLA code (https://github.com/hzxie/DynamicVLA):
from policies.dynamicvla.modeling_dynamicvla import DynamicVLAPolicy
policy = DynamicVLAPolicy.from_pretrained("mickeykang/dynamic-vla-DOM")
policy.eval().cuda()
from_pretrained restores the normalization buffers from model.safetensors, so no dataset is
needed to load or run inference. For the DOM benchmark, serve it with scripts/inference.py -p <dir>
against the Isaac Lab simulations/evaluate.py eval server.
utils/datasets.py (substitute a valid sample on any decode error) is needed to train
on the full set, but is not needed to load or evaluate this checkpoint.