rskill-3d-diffuser-actor-rlbench

3D Diffuser Actor — a diffusion policy over end-effector keyposes for RLBench, running on the CoppeliaSim/PyRep RLBench benchmark backend (ADR-0062).

What this skill does

Predicts the next end-effector keypose (position + orientation + gripper) from multi-view RGB-D, conditioned on a language instruction. Used to benchmark 3D/keyframe manipulation on the RLBench PerAct 18-task suite. Ships the three live-verified starter tasks: open_drawer, meat_off_grill, close_jar.

Field	Value
Actions	open, close, pick, place (generalist keyframe policy)
Objects	drawer, grill/meat, jar — (PerAct task objects)
Scenes	tabletop (RLBench / CoppeliaSim)
Embodiment	franka_panda

How it works

3D Diffuser Actor lifts the four RLBench camera RGB-D streams into a 3D point-cloud scene token field, attends over it with a relative-position transformer, and runs a DDPM diffusion head (100 denoising steps) to denoise an end-effector keypose trajectory. Each predicted keypose is executed in RLBench by its sampling-based motion planner (EndEffectorPoseViaPlanning), then the policy re-observes and predicts the next keypose. The policy and the CoppeliaSim/PyRep scene run in an out-of-process py3.10 sidecar (ZMQ + msgpack); the openral adapter (openral_sim.policies.rlbench_3dda) forks it transparently.

Observation → action contract

dir	key	shape	notes
in	`observation.images.{left_shoulder,right_shoulder,wrist,front}`	`(H, W, 3) uint8`	RLBench PerAct cameras, 256×256
in	`observation.point_clouds.{…}`	`(H, W, 3) float32`	per-camera world-frame point clouds
in	`observation.gripper_pose`	`(7,)` float32	`[x y z qx qy qz qw]`
out	keyframe action	`(8,)` float32	`[x y z qx qy qz qw gripper_open]` (world frame)

Upstream model / training

Weights are the authors' published RLBench PerAct multi-task checkpoint (diffuser_actor_peract.pth); loaded verbatim, not retrained. Trained by the authors on the PerAct 18-task RLBench demonstrations (multi-view RGB-D + keypose supervision).

Field	Value
Source repo	`nickgkan/3d_diffuser_actor`
Weights	`katefgroup/3d_diffuser_actor` — `diffuser_actor_peract.pth` (168 MB)
Paper	arxiv:2402.10885 — 3D Diffuser Actor: Policy Diffusion with 3D Scene Representations
License	mit (code + checkpoints) — commercially permissive
Parameters	~55 M
Training data	RLBench PerAct 18-task demonstrations

Supported robots

Robot	Scene	Status	Notes
franka_panda	RLBench (CoppeliaSim)	✓ validated	open_drawer 4/4, meat_off_grill 3/3, close_jar solved (8 GB Ada host, 2026-06-19)

Sensors required

key	modality	resolution	dtype
`observation.images.left_shoulder`	RGB	256 × 256	`uint8`
`observation.images.right_shoulder`	RGB	256 × 256	`uint8`
`observation.images.wrist`	RGB	256 × 256	`uint8`
`observation.images.front`	RGB	256 × 256	`uint8`

Manifest summary

Field	Value
`name`	`OpenRAL/rskill-3d-diffuser-actor-rlbench`
`version`	`0.1.0`
`license`	`mit`
`role`	`s1`
`model_family`	`diffuser_actor`
`embodiment_tags`	`franka_panda`
`runtime`	`pytorch`
`weights_uri`	`hf://katefgroup/3d_diffuser_actor`
`action_contract.dim`	`8`
`latency_budget.per_chunk_ms`	`3000.0`

Reproduction

# One-time: provision CoppeliaSim 4.1.0 + PyRep + RLBench@peract + the checkpoint
# in the py3.10 sidecar venv (see docs/adr/0062-rlbench-benchmark-backend.md).
openral benchmark scene \
  --config scenes/benchmark/rlbench_open_drawer.yaml \
  --rskill rskills/3d-diffuser-actor-rlbench

Inference VRAM peaks ~0.43 GB; runs comfortably on an 8 GB GPU. CoppeliaSim is proprietary (free EDU license) and is never vendored — it is an externally-provisioned dependency (CLAUDE.md §1.9 / ADR-0062).

Evaluation

eval/rlbench.json is the full official protocol result (reproduced_locally: true), produced by the canonical openral benchmark run (ADR-0009 PR D) on an 8 GB Ada host (2026-06-20) — 25 episodes per task, seeds 0–24, max 25 macro-keyposes:

Task	Success rate
`open_drawer`	22/25 = 0.88
`meat_off_grill`	24/25 = 0.96
`close_jar`	19/25 = 0.76
Average	0.867

(~946 ms mean step latency; in line with the 3D Diffuser Actor paper's ~0.81 RLBench PerAct average.) Reproduce with:

openral benchmark run --suite rlbench --rskill rskills/3d-diffuser-actor-rlbench

Note on variance. RLBench's sampling-based EndEffectorPoseViaPlanning mover is non-deterministic, so per-task rates vary run-to-run; 3 of the 75 episodes hit a planner path-failure and are counted as failed episodes (the sidecar handles them gracefully rather than aborting the run — ADR-0062). Per-task paper baselines (Ke et al., 2402.10885, Table 1) are intentionally not transcribed into the artifact to avoid mis-citation.

License

OpenRAL wrapper files in this repository follow the project Apache-2.0 license. The wrapped upstream 3D Diffuser Actor code and released diffuser_actor_peract.pth checkpoint are MIT-licensed; the manifest therefore uses license: mit for the consumer-visible weight/runtime posture.

Paper for OpenRAL/rskill-3d-diffuser-actor-rlbench

3D Diffuser Actor: Policy Diffusion with 3D Scene Representations

Paper • 2402.10885 • Published Jul 25, 2024

OpenRAL
/

rskill-3d-diffuser-actor-rlbench