I am a final-year Ph.D. student at University of Toronto, advised by Prof. Florian Shkurti. My research focuses on 3D computer vision, generative models, and robot learning. I am particularly interested in building structured 3D world models for perception, reconstruction, and generation.
I am currently a research intern at NVIDIA, where I work on incremental 3D scene reconstruction and video generation.
Opportunities: I will be graduating and seeking full-time research positions starting in Fall 2026. Please feel free to reach out if you see a potential fit.
During my PhD, I completed two research internships with the Surreal team at Meta Reality Labs. In the first internship, I worked on 3D reconstruction and perception for egocentric videos, mentored by Zhaoyang Lv and Chris Sweeney. In the second, I worked on egocentric video generation models with 3D spatial memory and human motion control, mentored by Julian Straub.
I earned my Master’s degree in Robotics from CMU, where I was advised by Prof. David Held and Prof. Martial Hebert. Before that, I received my Bachelor’s degree from HKUST, where I worked with Prof. Chi-Keung Tang and Prof. Yu-Wing Tai. I also completed a research internship at Tencent Youtu Lab.
Selected Publications
* indicates equal contribution.
| MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Models for Embodied Task Planning Yuanchen Ju*, Yongyuan Liang*, Yen-Jen Wang*, Gireesh Nandiraju, Yuanliang Ju, Seungjae Lee, Qiao Gu, Elvis Hsieh, Furong Huang, Koushil Sreenath International Conference on Learning Representations (ICLR), 2026 [project page] [paper] [code] [benchmark] |
| SAFE: Scalable Failure Estimation for Vision-Language-Action Models Qiao Gu, Yuanliang Ju, Shengxiang Sun, Igor Gilitschenski, Haruki Nishimura, Masha Itkina, Florian Shkurti Neural Information Processing Systems (NeurIPS), 2025 [paper] [project page] [code] |
| SICNav-Diffusion: Safe and Interactive Crowd Navigation with Diffusion Trajectory Predictions Sepehr Samavi, Anthony Lem, Fumiaki Sato, Sirui Chen, Qiao Gu, Keijiro Yano, Angela P. Schoellig, Florian Shkurti Robotics and Automation Letter (RA-L), 2025 [paper] [video] |
| EgoLifter: Open-world 3D Segmentation for Egocentric Perception Qiao Gu, Zhaoyang Lv, Duncan Frost, Simon Green, Julian Straub, Chris Sweeney European Conference on Computer Vision (ECCV), 2024 [paper] [video] [project page] |
| ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning Qiao Gu*, Alihusein Kuwajerwala*, Sacha Morin*, Krishna Murthy Jatavallabhula*, Bipasha Sen, Aditya Agarwal, Corban Rivera, William Paul, Kirsty Ellis, Rama Chellappa, Chuang Gan, Celso Miguel de Melo, Joshua B. Tenenbaum, Antonio Torralba, Florian Shkurti, Liam Paull International Conference on Robotics and Automation (ICRA), 2024 [paper] [video] [project page] [code] |
| ConceptFusion: Open-set Multimodal 3D Mapping Krishna Murthy Jatavallabhula, Alihusein Kuwajerwala*, Qiao Gu*, Mohd Omama*, Tao Chen, Shuang Li, Ganesh Iyer, Soroush Saryazdi, Nikhil Keetha, Ayush Tewari, Joshua B. Tenenbaum, Celso Miguel de Melo, Madhava Krishna, Liam Paull, Florian Shkurti, Antonio Torralba Robotics: Science and Systems (RSS) 2023 [paper] [video] [project page] [code] |
| Preserving Linear Separability in Continual Learning by Backward Feature Projection Qiao Gu, Dongsub Shim, Florian Shkurti Computer Vision and Pattern Recognition (CVPR), 2023 [paper] [video] [code] |
| OSSID: Onilne Self-supervised Instance Detection by (and for) Pose Estimation Qiao Gu, Brian Okorn, David Held Robotics and Automation Letter (RA-L), 2022 [paper] [video] [code] [project page] |
| ZePHyR: Zero-shot Pose Hypothesis Rating Brian Okorn*, Qiao Gu*, Martial Hebert, David Held International Conference on Robotics and Automation (ICRA), 2021 [paper] [video] [code] [project page] |
| Deep Video Matting via Spatio-Temporal Alignment and Aggregation Yanan Sun, Guanzhi Wang*, Qiao Gu*, Chi-Keung Tang, Yu-Wing Tai Conference on Computer Vision and Pattern Recognition (CVPR), 2021 [paper] [dataset] |
| LADN: Local Adversarial Disentangling Network for Facial Makeup and De-Makeup Qiao Gu*, Guanzhi Wang*, Mang Tik Chiu, Yu-Wing Tai, Chi-Keung Tang International Conference on Computer Vision (ICCV), 2019 [paper] [project page] [code] [dataset] |
Teaching
- Teaching Assistant for CSC384 - Intro to Artificial Intelligence, Fall 2022 at UofT
- Teaching Assistant for CSC384 - Intro to Artificial Intelligence, Winter 2022 at UofT
- Teaching Assistant for CSC110 - Foundations of Computer Science I, Fall 2021 at UofT
Service
- Reviwer for T-RO, TPAMI, IJRR, TMLR, RA-L, NeurIPS, CVPR, ICLR, ICCV, ICRA, IROS, ECCV, ICML, WACV
Selected Awards
- Qualcomm Innovation Fellowship Finalist (2025)
- Mary H. Beatty Fellowship (2025-2026)
- Mary H. Beatty Fellowship (2024-2025)
- NeurIPS Top Reviewer (2024)
- Ontario Graduate Scholarship (2022-2023)
- HKUST Academic Achievement Medal (2019)
- HKSAR Government Scholarships (2017-2019)
- Mr. Armin and Mrs. Lillian Kitchell Undergraduate Research Award (2018)
- Dean’s List (2015-2019)