rskill-3d-diffuser-actor-rlbench
3D Diffuser Actor β a diffusion policy over end-effector keyposes for RLBench, running on the CoppeliaSim/PyRep RLBench benchmark backend (ADR-0062).
What this skill does
Predicts the next end-effector keypose (position + orientation + gripper) from
multi-view RGB-D, conditioned on a language instruction. Used to benchmark
3D/keyframe manipulation on the RLBench PerAct 18-task suite. Ships the three
live-verified starter tasks: open_drawer, meat_off_grill, close_jar.
| Field | Value |
|---|---|
| Actions | open, close, pick, place (generalist keyframe policy) |
| Objects | drawer, grill/meat, jar β (PerAct task objects) |
| Scenes | tabletop (RLBench / CoppeliaSim) |
| Embodiment | franka_panda |
How it works
3D Diffuser Actor lifts the four RLBench camera RGB-D streams into a 3D point-cloud
scene token field, attends over it with a relative-position transformer, and runs a
DDPM diffusion head (100 denoising steps) to denoise an end-effector keypose
trajectory. Each predicted keypose is executed in RLBench by its sampling-based
motion planner (EndEffectorPoseViaPlanning), then the policy re-observes and
predicts the next keypose. The policy and the CoppeliaSim/PyRep scene run in an
out-of-process py3.10 sidecar (ZMQ + msgpack); the openral adapter
(openral_sim.policies.rlbench_3dda) forks it transparently.
Observation β action contract
| dir | key | shape | notes |
|---|---|---|---|
| in | observation.images.{left_shoulder,right_shoulder,wrist,front} |
(H, W, 3) uint8 |
RLBench PerAct cameras, 256Γ256 |
| in | observation.point_clouds.{β¦} |
(H, W, 3) float32 |
per-camera world-frame point clouds |
| in | observation.gripper_pose |
(7,) float32 |
[x y z qx qy qz qw] |
| out | keyframe action | (8,) float32 |
[x y z qx qy qz qw gripper_open] (world frame) |
Upstream model / training
Weights are the authors' published RLBench PerAct multi-task checkpoint
(diffuser_actor_peract.pth); loaded verbatim, not retrained. Trained by the
authors on the PerAct 18-task RLBench demonstrations (multi-view RGB-D + keypose
supervision).
| Field | Value |
|---|---|
| Source repo | nickgkan/3d_diffuser_actor |
| Weights | katefgroup/3d_diffuser_actor β diffuser_actor_peract.pth (168 MB) |
| Paper | arxiv:2402.10885 β 3D Diffuser Actor: Policy Diffusion with 3D Scene Representations |
| License | mit (code + checkpoints) β commercially permissive |
| Parameters | ~55 M |
| Training data | RLBench PerAct 18-task demonstrations |
Supported robots
| Robot | Scene | Status | Notes |
|---|---|---|---|
| franka_panda | RLBench (CoppeliaSim) | β validated | open_drawer 4/4, meat_off_grill 3/3, close_jar solved (8 GB Ada host, 2026-06-19) |
Sensors required
| key | modality | resolution | dtype |
|---|---|---|---|
observation.images.left_shoulder |
RGB | 256 Γ 256 | uint8 |
observation.images.right_shoulder |
RGB | 256 Γ 256 | uint8 |
observation.images.wrist |
RGB | 256 Γ 256 | uint8 |
observation.images.front |
RGB | 256 Γ 256 | uint8 |
Manifest summary
| Field | Value |
|---|---|
name |
OpenRAL/rskill-3d-diffuser-actor-rlbench |
version |
0.1.0 |
license |
mit |
role |
s1 |
model_family |
diffuser_actor |
embodiment_tags |
franka_panda |
runtime |
pytorch |
weights_uri |
hf://katefgroup/3d_diffuser_actor |
action_contract.dim |
8 |
latency_budget.per_chunk_ms |
3000.0 |
Reproduction
# One-time: provision CoppeliaSim 4.1.0 + PyRep + RLBench@peract + the checkpoint
# in the py3.10 sidecar venv (see docs/adr/0062-rlbench-benchmark-backend.md).
openral benchmark scene \
--config scenes/benchmark/rlbench_open_drawer.yaml \
--rskill rskills/3d-diffuser-actor-rlbench
Inference VRAM peaks ~0.43 GB; runs comfortably on an 8 GB GPU. CoppeliaSim is proprietary (free EDU license) and is never vendored β it is an externally-provisioned dependency (CLAUDE.md Β§1.9 / ADR-0062).
Evaluation
eval/rlbench.json is the full official protocol
result (reproduced_locally: true), produced by the canonical
openral benchmark run (ADR-0009 PR D) on an 8 GB Ada host (2026-06-20) β
25 episodes per task, seeds 0β24, max 25 macro-keyposes:
| Task | Success rate |
|---|---|
open_drawer |
22/25 = 0.88 |
meat_off_grill |
24/25 = 0.96 |
close_jar |
19/25 = 0.76 |
| Average | 0.867 |
(~946 ms mean step latency; in line with the 3D Diffuser Actor paper's ~0.81 RLBench PerAct average.) Reproduce with:
openral benchmark run --suite rlbench --rskill rskills/3d-diffuser-actor-rlbench
Note on variance. RLBench's sampling-based
EndEffectorPoseViaPlanningmover is non-deterministic, so per-task rates vary run-to-run; 3 of the 75 episodes hit a planner path-failure and are counted as failed episodes (the sidecar handles them gracefully rather than aborting the run β ADR-0062). Per-task paper baselines (Ke et al., 2402.10885, Table 1) are intentionally not transcribed into the artifact to avoid mis-citation.
License
OpenRAL wrapper files in this repository follow the project Apache-2.0 license.
The wrapped upstream 3D Diffuser Actor code and released
diffuser_actor_peract.pth checkpoint are MIT-licensed; the manifest therefore
uses license: mit for the consumer-visible weight/runtime posture.
See also
scenes/benchmark/rlbench_open_drawer.yamlscenes/benchmark/rlbench_meat_off_grill.yamlscenes/benchmark/rlbench_close_jar.yamlbenchmarks/rlbench.yamldocs/adr/0062-rlbench-benchmark-backend.md