Abstract
Dexterous grasping with multi-finger robotic hands is essential for general-purpose robotic manipulation, but remains challenging due to high-dimensional hand configurations, multimodal grasp distributions, and contact-rich execution dynamics. Existing methods often decouple grasp target generation from execution policy learning, which limits the consistency between generated grasp goals and downstream control. To address this problem, we propose DexGraspDiffuser, a target-coupled grasp and action diffusion framework for dexterous grasping. The first stage, GraspDiffusion, generates diverse and physically plausible target grasps from object point clouds using a compact representation of hand root translation, continuous rotation, and finger joint configuration. The second stage, a Goal-Conditioned Diffusion Policy, predicts temporally coherent action sequences conditioned on the selected target grasp and current observation. During inference, receding-horizon execution enables action-prefix execution and online replanning for improved robustness. Experiments demonstrate that DexGraspDiffuser achieves success rates of 0.76, 0.72, and 0.68 on training objects, unseen objects from seen categories, and objects from unseen categories, respectively. These results correspond to a three-split average success rate of 0.72 and a train-to-unseen generalization gap of 0.08. Compared with the reproduced UniDexGrasp-T baseline under the same object split and evaluation protocol, DexGraspDiffuser improves the three-split average success rate by 3.3 percentage points and reduces the average mean position error by 0.53 cm. This indicates that target-coupled grasp and action diffusion contribute to improved grasp quality, execution accuracy, and closed-loop stability.
Keywords
€ 4.00