Touch in the Wild

Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper

Anonymous

Our Visual-Tactile policy can perform fine-grained manipulations, such as handling transparent test tubes.

(Under human disturbances and visual occlusions, the robot relies on tactile feedback to guide its decisions.)

In-the-Wild Data Collection

We collected over 2,700 demonstrations covering 43 manipulation tasks across 12 indoor and outdoor environments. This provided us with more than 2.6 million visuo-tactile pairs for Visuo-Tactile Pretraining & Downstream Imitation Learning.

Visuo-Tactile Pretraining & Downstream Imitation Learning

We pretrain on a large corpus of image-tactile pairs using a cross-attention mechanism. The model learns to reconstruct tactile images conditioned on masked tactile inputs and associated camera images. This pretraining yields a joint visuo-tactile representation, which is then combined with robot proprioceptive states and used as input for downstream manipulation tasks.

Pretraining Illustration

Task Demonstrations

Tasks Requiring In-Hand State Information

(1) Transparent Tube Collection. The robot must pick up a test tube from a box, reorient it in-hand using the test tube rack, and precisely insert it into the test tube rack.
(2) Pencil Insertion. The robot needs to insert a pencil into a sharpener. Since the pencil is initially grasped upright and vertically, the robot must first reorient it carefully before performing a precise insertion.

Tasks Requiring Fine-Grained Force Information

(3) Fluid Transfer. The robot uses a pipette to transfer water between containers. It must grasp the pipette firmly, apply just enough pressure to extract liquid without dropping it. Then the robot need to move to the top of other container and gently squeeze to release the water into it.
(4) Whiteboard Erasing. The robot uses a soft eraser to remove two strokes of text from the whiteboard. It must apply the right amount of pressure to erase the marker ink without exceeding force limits that could damage the system. The task requires consistent and controlled force application throughout.

Policy Robustness: Comparison with Baselines

Performance Comparison: Pretrained Policy vs. Vision-Only Policies

Ours: Successful Reorientation and Insertion

Vision-Only Baseline: Repeated Reorientation

Ours: Successful Expelling Fluid

Vision-Only Baseline: Skip Expelling Fluid

Ours: Clean Erase

Vision-Only Baseline: Unclean Erase

Ours: Reliable Reorientation with Precise In-hand Information

Vision-Only Baseline: Missed Reorientation

Pretraining Ablations