AURORA: Active Uncertainty-Driven Re-Orientation for In-Hand Reconstruction

teaser

AURORA actively reorients an object based on uncertainty-driven next-best-view planning to expose unobserved regions, enabling incremental observation and 3D reconstruction from a fixed RGB-D camera.

Abstract

Recovering complete 3D geometry of robot-held objects is fundamentally limited by observability. The manipulator occludes the object, and only a small set of viewpoints is accessible during grasping, leaving large surface regions unobserved. Although in-hand manipulation can reveal these regions, most existing methods rely on fixed manipulation sequences and lack closed-loop reasoning about missing views. To bridge the gap, we present AURORA, an active in-hand 3D reconstruction framework that integrates perception, uncertainty-aware planning, and in-hand manipulation. Its core component Ray-GPIS estimates reconstruction uncertainty over viewing directions and selects next-best-view object reorientations that maximize information gain. These reorientations are executed using an axis-conditioned in-hand rotation policy. The observed manipulation sequence is processed through object segmentation, online 6D pose tracking, and incremental RGB-D fusion to produce efficient and high-fidelity reconstructions. Extensive real-world experiments on objects with diverse geometries show that AURORA consistently outperforms open-loop baselines, achieving higher completeness, faster convergence, and robust performance across varied shapes.

Pipeline

Overview of the proposed technical pipeline. The system integrates four modules: (a) in-hand object reorientation with the Leap Hand; (b) 6D pose tracking via BundleTrack; (c-d) reconstruction; (e) uncertainty-driven next-best-view planning.

Pipeline overview

Experiment I — Reconstruction Quality

We evaluate reconstruction performance on six graspable real-world objects under a fixed manipulation budget of 30 s. For each object, we show the qualitative results (RGB, online point cloud, offline refined mesh), followed by quantitative F-scores.

Budget: 30 s
Online: PCD–PCD
Offline: Mesh–Mesh
Metric: F@τ (harmonic mean of precision & recall)
Real-Robot Demo. In-hand rotation on Cube (30s).
We show the real-world in-hand rotation used in Experiment I for data capture.
Object
Point Cloud
Mesh (Normal Colormap)
Table. Quantitative results after a fixed 30 s budget. We report F-scores F@τ (higher is better), where F@τ is the harmonic mean of precision and recall within tolerance τ.
Obj. PCD–PCD (Online) Mesh–Mesh (Offline)
F@2 ↑F@5 ↑F@10 ↑ F@2 ↑F@5 ↑F@10 ↑
Cube0.28950.93370.99770.64810.95570.9957
Corner Block0.23530.73540.93030.52980.85590.9488
L-shaped Block0.23060.83660.98860.46740.83670.9450
Pepper0.17700.83210.97730.54800.88900.9913
Cylinder0.44280.84000.94250.40350.83200.9372
Cross Block0.14540.77180.96640.46010.80500.9850
Mean0.25340.82490.96710.50950.86240.9672
Std.0.10540.06790.02630.08550.05350.0262

Summary. Under a fixed 30 s manipulation budget, our framework delivers strong online reconstructions with an average F@10 = 0.9671 ± 0.0263 and F@5 = 0.8249 ± 0.0679. The offline refinement stage further improves geometric consistency and reduces residual artifacts, achieving F@10 = 0.9672 ± 0.0262 and F@5 = 0.8624 ± 0.0535. Remaining errors are primarily caused by sensing noise and small pose misalignments due to tracking inaccuracies.

Experiment II — Efficiency vs Non-active Baselines

We compare five rotation strategies using the temporal evolution of the uncertainty tail \(q_{95}\): ours, a fixed schedule \(z\!\rightarrow\!x\!\rightarrow\!y\!\rightarrow\!z\!\rightarrow\!x\), and single-axis baselines (\(x\)-only/\(y\)-only/\(z\)-only). Below, we show the offline refined meshes for all five strategies (normal colormap).

q95 uncertainty curves
\(q_{95}\) uncertainty over time (avg. over 6 objects) under five rotation strategies: ours, fixed \(z\!\rightarrow\!x\!\rightarrow\!y\!\rightarrow\!z\!\rightarrow\!x\), and \(x\)-only, \(y\)-only, \(z\)-only.
Pepper (RGB)
Captured view
pepper rgb
Ours (Closed-loop)
Offline Mesh
Scroll to load 3D…
Drag · Scroll
Fixed (z→x→y→z→x)
Offline Mesh
Scroll to load 3D…
Drag · Scroll
x-only
Offline Mesh
Scroll to load 3D…
Drag · Scroll
y-only
Offline Mesh
Scroll to load 3D…
Drag · Scroll
z-only
Offline Mesh
Scroll to load 3D…
Drag · Scroll

Tip. Click the small Normals ▸ legend to expand/collapse the colorbar (so it never blocks the mesh).

Discussion — Comparison with Neural Feels

We compare AURORA with Neural Feels under a matched 30 s interaction budget on Pepper. Our method achieves higher F@5mm and produces more complete geometry via active replanning and stable in-palm reorientation.

F@5mm comparison: ours vs Neural Feels
F@5mm over time (Pepper, 30 s). Ours reaches 0.8890, vs. 0.5907 reported by Neural Feels.
Pepper (RGB)
pepper rgb
Ours
Scroll to load 3D…
Drag · Scroll
NeuralFeels
Scroll to load 3D…
Drag · Scroll

Efficiency. Neural Feels reports 417.13 s wall-clock time for reconstruction from a 30 s window, while ours completes the same stage in 61.82 s (~6.7× faster), enabling substantially lower end-to-end latency.