Learning to Design and Use Tools for Robotic Manipulation

Stanford University

A robot may need to use different tools to fetch an out-of-reach book (blue) or push it into the bookshelf (pink). Therefore, it should rapidly prototype tools for the tasks at hand.

Abstract

When limited by their own morphologies, humans and some species of animals have the remarkable ability to use objects from the environment toward accomplishing otherwise impossible tasks. Robots might similarly unlock a range of additional capabilities through tool use. Recent techniques for jointly optimizing morphology and control via deep learning are effective at designing locomotion agents. But while outputting a single morphology makes sense for locomotion, manipulation involves a variety of strategies depending on the task goals at hand. A manipulation agent must be capable of rapidly prototyping specialized tools for different goals. Therefore, we propose learning a designer policy , rather than a single design. A designer policy is conditioned on task information and outputs a tool design that helps solve the task. A design-conditioned controller policy can then perform manipulation using these tools. In this work, we take a step towards this goal by introducing a reinforcement learning framework for jointly learning these policies. Through simulated manipulation tasks, we show that this framework is more sample efficient than prior methods in multi-goal or multi-variant settings, can perform zero-shot interpolation or fine-tuning to tackle previously unseen goals, and allows tradeoffs between the complexity of design and control policies under practical constraints. Finally, we deploy our learned policies onto a real robot.

Tool Design and Control Tasks


Push

Catch Balls

Scoop

Fetch Cube

Lift Cup

3D Scoop

Framework


Solving a task using learned designer and controller policies. During the design phase, the designer policy outputs the parameters for a tool that will help solve the given task. In the control phase, the controller policy outputs motor commands given the tool structure, task specification, and environment observation.

Sample Efficiency


Push

Catch Balls

Scoop

Fetch Cube

Lift Cup

3D Scoop

Learning curves for our framework, prior methods, and baselines. Across all tasks, our framework achieves improved performance and sample efficiency. Shaded areas indicate standard error across 6 random seeds on all methods, except the Scoop (3D) task where we use 3 seeds due to computational constraints.

Design-Control Tradeoff


Control cost / design cost ratio with different α.

α = 1.0

α = 0.3

α = 0.7

α = 0.0

Qualitative examples of tools generated by setting our tradeoff parameter α to different values. We can see that as α increases, the tools created by the designer policy have shorter links at the left and right sides to decrease material usage. With low α values, large tools prevent the control policy from having to move the tool far.


Generalization


(a) Initialization ranges and zero-shot performance when cutting out 60% of the area of the entire possible training region.

(b) Returns for policies trained with varying relative cutout region area.

(c) Fine-tuning performance compared to learning from scratch across 4 target goals

Interpolation results on the pushing task. In (a), we plot the success (light blue) and failure (dark blue) goal regions. Areas within the dotted yellow borders denote unseen cutout regions (interpolation). The region within the teal border (but outside cutout regions) shows the training region. The area outside the teal border is unseen during training (extrapolation). (b) and (c) show return curves (averaged over 3 runs). Shaded regions denote standard error. In (b), we observe that the performance for policies trained with small cutouts is close to that of the setting trained on all goals. In (c) we show that even for poses that are far away from the initial training region, our policies are able to learn to solve the task within a handful of gradient steps, and is much more effective than learning from scratch.

Goal-Conditioned Design and Control


Visualizations of tool designs outputted by a single learned designer policy for push, fetch cube, lift cup, and 3D scoop as the initial/goal position varies.

Supplementary Video