Retrieval Dexterity: Efficient Object Retrieval in Clutters with Dexterous Hand

Shanghai Jiao Tong University , PKU-PsiBot Joint Lab , Peking University

Retrieval Dexterity is a system that learns efficient object retrieval in simulation and demonstrates zero-shot real-world deployment.

Abstract

Retrieving objects buried beneath multiple objects is not only challenging but also time-consuming. Performing manipulation in such environments presents significant difficulty due to complex contact relationships. Existing methods typically address this task by sequentially grasping and removing each occluding object, resulting in lengthy execution times and requiring impractical grasping capabilities for every occluding object. In this paper, we present a dexterous arm-hand system for efficient object retrieval in multi-object stacked environments. Our approach leverages large-scale parallel reinforcement learning within diverse and carefully designed cluttered environments to train policies. These policies demonstrate emergent manipulation skills (e.g., pushing, stirring, and poking) that efficiently clear occluding objects to expose sufficient surface area of the target object. We conduct extensive evaluations across a set of over 10 household objects in diverse clutter configurations, demonstrating superior retrieval performance and efficiency for both trained and unseen objects. Furthermore, we successfully transfer the learned policies to a real-world dexterous multi-fingered robot system, validating their practical applicability in real-world scenarios.

Method

Method overview
(1) Constructs diverse cluttered scenes using a drop-from-above strategy. (2) Utilizes large-scale parallel RL with well-designed rewards to train policies. (3) Generates trajectories from the RL expert policy, selects useful ones based on our principle, and trains the distilled policy for deployment on a real robot.

RL Policy in Simulation

Yellow box shows the target object in the training set, while blue box shows the target object in the test set

Target Object Position Generalization

The target objects are placed in distinct regions of the box: top-left, bottom-left, top-right, and bottom-right.

Retrieval Efficiency

Conclusion

In this work, we have presented a novel approach to efficient object retrieval in cluttered environments using dexterous multi-finger hands. Our system demonstrates the ability to manipulate occluding objects strategically, exposing target objects for retrieval—a capability that significantly improves upon traditional sequential removal methods. Through careful design of our simulation environment and reinforcement learning framework, we have addressed key challenges including time efficiency, object diversity, and the complexity of high-dimensional control in contact-rich environments. Our experimental results, both in simulation and real-world settings, validate the effectiveness of our approach. The system successfully generalizes across diverse objects and achieves zero-shot transfer to real-world robots, demonstrating robust performance without additional training. This work represents a step toward more efficient and capable robotic manipulation in cluttered environments, though opportunities remain for future exploration, particularly in achieving fully autonomous operation through integration with advanced perception systems.

BibTeX

@inproceedings{bai2025retrdex,
    author    = {Fengshuo Bai, Yu Li, Jie Chu, Tawei Chou, Runchuan Zhu, Ying Wen, Yaodong Yang, Yuanpei Chen},
    title     = {Retrieval Dexterity: Efficient Object Retrieval in Clutters with Dexterous Hand},
    booktitle = {arXiv preprint arXiv:2502.18423},
    year      = {2025}
}