Generalizable Long-Horizon Manipulations with Large Language Models

Haoyu Zhou1, Mingyu Ding2, Weikun Peng1, Masayoshi Tomizuka2, Lin Shao1, Chuang Gan3,4
1National University of Singapore; 2UC Berkeley, USA; 3University of Massachusetts Amherst, USA; 4MIT-IBM Watson AI Lab, USA

Abstract

This work introduces a framework harnessing the capabilities of Large Language Models (LLMs) to generate primitive task conditions for generalizable long-horizon manipulations with novel objects and unseen tasks. These task conditions serve as guides for the generation and adjustment of Dynamic Movement Primitives (DMP) trajectories for longhorizon task execution. We further create a challenging robotic manipulation task suite based on Pybullet for long-horizon task evaluation. Extensive experiments in both simulated and realworld environments demonstrate the effectiveness of our framework on both familiar tasks involving new objects and novel but related tasks, highlighting the potential of LLMs in enhancing robotic system versatility and adaptability

Framework Structure

We leverage LLMs to generate and generalize primitive task conditions for both familiar tasks with novel objects and novel but related tasks. Subsequently, the highlevel task conditions guide the generation and adjustment of lowlevel trajectories originally learned from demonstrations for longhorizon task execution.

Task condition generation and generalization experiment

We evaluate the ability of our framework to generate and generalize task conditions on all 10 primitive tasks. The LLM (GPT-3.5) is provided with condition examples. Comparison is made with task conditions generated from environments (methods explained in paper). A successfully generated task condition should contain accurate and enough information to guide the execution of the primitive task.

Simulation experiment

To evaluate our framework's ability to execute long-horizon tasks using DMP trajectories generated and adjuested by task conditions, we design a challenging Robotic Manipulation Task Suite in Pybullet.

The environment consists of two 7 Dof robots Franka and Kinova with a kitchen scene including various interactive objects. It contains 10 diverse primitive tasks (37 if considering different objects) and 4 long-horizon tasks in simulation.



Demonstration: Put the bowl into the bottom drawer

Generation and execution: Put the bowl into the upper drawer




Demonstration: Put the bottle into the box

Generation and execution: Put the bowl into the box




Move the bottle into the box w/o task condition

The move into primitive task is failed because initial goal position obtained from environment pointcloud is deviated. And a collision between bottle and box happened.

Move the bottle into the box w/ task condition adjusting trajectory

As the collision between bottle and box is against task condition collision, the manipulator backwards a few steps and adjust goal position to generate a new DMP trajectory, eventually complete the task.

Task condition by LLM

Task name: move bottle into box
Relevant objects:
bottle, box, gripper
Pre-conditions:
gripper grasping bottle
box open
Post-conditions:
gripper grasping bottle
bottle inside box
box open
Collisions:
gripper and bottle



Grasp the bottle w/o task condition

The move into primitive task is failed because initial goal position obtained from environment pointcloud is deviated. And a collision between bottle and box happened.

Grasp the bottle w/ task condition
generating trajectory

As the collision between bottle and box is against task condition collision, the manipulator backwards a few steps and adjust goal position to generate a new DMP trajectory, eventually complete the task.

Task condition by LLM

Task name: grasp bottle
Relevant objects:
bottle, gripper
Pre-conditions:
gripper not grasping
Post-conditions:
gripper grasping bottle
Collisions:
gripper and bottle

Real-world experiment

To demonstrate the practicality of our to perform longhorizon tasks in real scenarios, we set up a real-world long-horizon task, featuring a dual-arm MOVO robot which has two 7 DoF manipulators and a Kinect RGB-D camera overhead. We use Segment Anything to obtain segmented point clouds of the surroundings and objects. Position control is performed as we use rangeIK for solving inverse kinematics.

■  grasp the bottle;
■  open the microwave;
■  move the bottle in front of the microwave;
■  move the bottle into the microwave:
■  release the bottle;
■  move out of the microwave;
■  close the microwave.




Move the bottle into the microwave w/o task condition

Move the bottle into the microwave w/ task condition adjusting trajectory

Task condition by LLM

Task name: move bottle into microwave
Relevant objects:
bottle, microwave, gripper
Pre-conditions:
gripper grasping bottle
microwave open
Post-conditions:
gripper grasping bottle
bottle inside microwave
microwave open
Collisions:
gripper and bottle



Grasp the bottle w/o task condition

Grasp the bottle w/ task condition
generating trajectory

Task condition by LLM

Task name: right grasp bottle
Relevant objects:
bottle, right gripper
Pre-conditions:
right gripper not grasping
Post-conditions:
right gripper grasping bottle
Collisions:
right gripper and bottle