rskill-3d-diffuser-actor-rlbench

3D Diffuser Actor β€” a diffusion policy over end-effector keyposes for RLBench, running on the CoppeliaSim/PyRep RLBench benchmark backend (ADR-0062).

What this skill does

Predicts the next end-effector keypose (position + orientation + gripper) from multi-view RGB-D, conditioned on a language instruction. Used to benchmark 3D/keyframe manipulation on the RLBench PerAct 18-task suite. Ships the three live-verified starter tasks: open_drawer, meat_off_grill, close_jar.

Field Value
Actions open, close, pick, place (generalist keyframe policy)
Objects drawer, grill/meat, jar β€” (PerAct task objects)
Scenes tabletop (RLBench / CoppeliaSim)
Embodiment franka_panda

How it works

3D Diffuser Actor lifts the four RLBench camera RGB-D streams into a 3D point-cloud scene token field, attends over it with a relative-position transformer, and runs a DDPM diffusion head (100 denoising steps) to denoise an end-effector keypose trajectory. Each predicted keypose is executed in RLBench by its sampling-based motion planner (EndEffectorPoseViaPlanning), then the policy re-observes and predicts the next keypose. The policy and the CoppeliaSim/PyRep scene run in an out-of-process py3.10 sidecar (ZMQ + msgpack); the openral adapter (openral_sim.policies.rlbench_3dda) forks it transparently.

Observation β†’ action contract

dir key shape notes
in observation.images.{left_shoulder,right_shoulder,wrist,front} (H, W, 3) uint8 RLBench PerAct cameras, 256Γ—256
in observation.point_clouds.{…} (H, W, 3) float32 per-camera world-frame point clouds
in observation.gripper_pose (7,) float32 [x y z qx qy qz qw]
out keyframe action (8,) float32 [x y z qx qy qz qw gripper_open] (world frame)

Upstream model / training

Weights are the authors' published RLBench PerAct multi-task checkpoint (diffuser_actor_peract.pth); loaded verbatim, not retrained. Trained by the authors on the PerAct 18-task RLBench demonstrations (multi-view RGB-D + keypose supervision).

Field Value
Source repo nickgkan/3d_diffuser_actor
Weights katefgroup/3d_diffuser_actor β€” diffuser_actor_peract.pth (168 MB)
Paper arxiv:2402.10885 β€” 3D Diffuser Actor: Policy Diffusion with 3D Scene Representations
License mit (code + checkpoints) β€” commercially permissive
Parameters ~55 M
Training data RLBench PerAct 18-task demonstrations

Supported robots

Robot Scene Status Notes
franka_panda RLBench (CoppeliaSim) βœ“ validated open_drawer 4/4, meat_off_grill 3/3, close_jar solved (8 GB Ada host, 2026-06-19)

Sensors required

key modality resolution dtype
observation.images.left_shoulder RGB 256 Γ— 256 uint8
observation.images.right_shoulder RGB 256 Γ— 256 uint8
observation.images.wrist RGB 256 Γ— 256 uint8
observation.images.front RGB 256 Γ— 256 uint8

Manifest summary

Field Value
name OpenRAL/rskill-3d-diffuser-actor-rlbench
version 0.1.0
license mit
role s1
model_family diffuser_actor
embodiment_tags franka_panda
runtime pytorch
weights_uri hf://katefgroup/3d_diffuser_actor
action_contract.dim 8
latency_budget.per_chunk_ms 3000.0

Reproduction

# One-time: provision CoppeliaSim 4.1.0 + PyRep + RLBench@peract + the checkpoint
# in the py3.10 sidecar venv (see docs/adr/0062-rlbench-benchmark-backend.md).
openral benchmark scene \
  --config scenes/benchmark/rlbench_open_drawer.yaml \
  --rskill rskills/3d-diffuser-actor-rlbench

Inference VRAM peaks ~0.43 GB; runs comfortably on an 8 GB GPU. CoppeliaSim is proprietary (free EDU license) and is never vendored β€” it is an externally-provisioned dependency (CLAUDE.md Β§1.9 / ADR-0062).

Evaluation

eval/rlbench.json is the full official protocol result (reproduced_locally: true), produced by the canonical openral benchmark run (ADR-0009 PR D) on an 8 GB Ada host (2026-06-20) β€” 25 episodes per task, seeds 0–24, max 25 macro-keyposes:

Task Success rate
open_drawer 22/25 = 0.88
meat_off_grill 24/25 = 0.96
close_jar 19/25 = 0.76
Average 0.867

(~946 ms mean step latency; in line with the 3D Diffuser Actor paper's ~0.81 RLBench PerAct average.) Reproduce with:

openral benchmark run --suite rlbench --rskill rskills/3d-diffuser-actor-rlbench

Note on variance. RLBench's sampling-based EndEffectorPoseViaPlanning mover is non-deterministic, so per-task rates vary run-to-run; 3 of the 75 episodes hit a planner path-failure and are counted as failed episodes (the sidecar handles them gracefully rather than aborting the run β€” ADR-0062). Per-task paper baselines (Ke et al., 2402.10885, Table 1) are intentionally not transcribed into the artifact to avoid mis-citation.

License

OpenRAL wrapper files in this repository follow the project Apache-2.0 license. The wrapped upstream 3D Diffuser Actor code and released diffuser_actor_peract.pth checkpoint are MIT-licensed; the manifest therefore uses license: mit for the consumer-visible weight/runtime posture.

See also

  • scenes/benchmark/rlbench_open_drawer.yaml
  • scenes/benchmark/rlbench_meat_off_grill.yaml
  • scenes/benchmark/rlbench_close_jar.yaml
  • benchmarks/rlbench.yaml
  • docs/adr/0062-rlbench-benchmark-backend.md
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for OpenRAL/rskill-3d-diffuser-actor-rlbench