Full project paper and work available here.
This repository contains two modular deep neural networks (DNNs) designed for robotic task learning from human demonstrations using spherical representations. The models work in tandem as shown in the figure below:
Model Part I predicts likelihood maps representing the probability distribution of grasp positions. The model takes a 2D image as input, generated via a hemispherical transformation of a 3D object mesh, and outputs likelihood estimates of where a grasp is most likely to occur.
Predictions from Model Part I are depicted below:
Model Part II is a meta-learned model trained using First-Order MAML (FOMAML). It refines the likelihood maps produced by Model Part I based on human demonstration data. Additionally, it outputs maximum likelihood grasp angles, including azimuth, zenith, and a rotational angle (γ). Model Part II takes as input both the spherically transformed mesh image and the likelihood priors from Model Part I.
While training Model Part II with FOMAML, we explored different task augmentation strategies:
- Effective Augmentation: Adding discrete noise to angular data improved adaptability without degrading performance.
- Ineffective Augmentation: Modifying labeled likelihood maps negatively impacted the model’s flexibility.
The dataset used for training these models was generated using a custom pipeline. Details can be found in the following repository: Spherical Data Generation for 3D Meshes.
This repository provides an approach to robotic grasp learning through human demonstrations, leveraging spherical representations and meta-learning techniques. Contributions, issues, and discussions are welcome!