OSSID: Online Self-Supervised Instance Detection
(and for) Pose Estimation
Qiao Gu
Brian Okorn
David Held
[Paper]
[Video]
[Code]
We propose a self-supervised learning pipeline for object instance detection by pose estimation. The results of a zero-shot pose estimation network are used to finetune a zero-shot detector online. Then the detection results in turn provide object bounding boxes and reduce the search space for pose estimation. Without any manual annotation required, both the detector and the pose estimator get better and faster.

Abstract

Real-time object pose estimation is necessary for many robot manipulation algorithms. However, state-of-the-art methods for object pose estimation are trained for a specific set of objects; these methods thus need to be retrained to estimate the pose of each new object, often requiring tens of GPU-days of training for optimal performance. In this paper, we propose the OSSID framework, leveraging a slow zero-shot pose estimator to self-supervise the training of a fast detection algorithm. This fast detector can then be used to filter the input to the pose estimator, drastically improving its inference speed. We show that this self-supervised training exceeds the performance of existing zero-shot detection methods on two widely used object pose estimation and detection datasets, without requiring any human annotations. Further, we show that the resulting method for pose estimation has a significantly faster inference speed, due to the ability to filter out large parts of the image. Thus, our method for self-supervised online learning of a detector (trained using pseudo-labels from a slow pose estimator) leads to accurate pose estimation at real-time speeds, without requiring human annotations.


Talk



Paper and Supplementary Material

Qiao Gu, Brian Okorn, David Held.
OSSID: Online Self-Supervised Instance Detection
(and for) Pose Estimation

In RA-L and ICRA 2022.
(hosted on ArXiv)




Acknowledgements

This template was originally made by Phillip Isola and Richard Zhang for a colorful ECCV project; the code can be found here.