Is Mapping Necessary for Realistic PointGoal Navigation?

CVPR 2022


Ruslan Partsey1,    Erik Wijmans2,3,    Naoki Yokoyama2,    Oles Dobosevych1,    Dhruv Batra2,3,    Oleksandr Maksymets3


1Ukrainian Catholic University    2Georgia Institute of Technology    3Meta AI

GitHub

Can an autonomous agent navigate in a new environment without ever building an explicit map? For the task of PointGoal navigation (`Go to $\Delta x$, $\Delta y$`) under idealized settings (no RGB-D and actuation noise, perfect GPS+Compass), the answer is a clear `yes` — map-less neural models composed of task-agnostic components (CNNs and RNNs) trained with large-scale reinforcement learning achieve 100% Success on a standard dataset (Gibson [1]). However, for PointNav in a realistic setting (RGB-D and actuation noise, no GPS+Compass), this is an open question; one we tackle in this paper. The strongest published result for this task is 71.7% Success [2]1. First, we identify the main (perhaps, only) cause of the drop in performance: absence of GPS+Compass. An agent with perfect GPS+Compass faced with RGB-D sensing and actuation noise achieves 99.8% Success (Gibson-v2 val). This suggests that (to paraphrase a meme) robust visual odometry is all we need for realistic PointNav; if we can achieve that, we can ignore the sensing and actuation noise. With that as our operating hypothesis, we develop human-annotation-free data-augmentation techniques to train neural models for visual odometry. Taken together with our other proposed methods, we advance state of the art on the Habitat Realistic PointNav Challenge — SPL by 40% (relative), 53 to 74, and Success by 31% (relative), 71 to 94. While our approach does not saturate or `solve` this dataset, this strong improvement provides evidence consistent with the hypothesis that explicit mapping may not be necessary for navigation, even in realistic setting.

1According to Habitat Challenge 2020 PointNav benchmark held annually. A concurrent as-yet-unpublished result has reported 91% Success on 2021's benchmark, but we are unable to comment on the details because an associated report is not available.

Online Leaderboard


Results


Gibson 4+ (val)

Matterport3D (val)

Agent is asked to navigate from blue square to green square. The color of the trajectory changes from dark to light over time (cv2.COLORMAP_WINTER for agent's trajectory, cv2.COLORMAP_AUTUMN for agent's estimate of its trajectory).

Navigation videos and top-down maps were generated during two different agent runs, which means other actuation and sensing noise were applied, so the trajectories on video and image may be slightly different.

To see navigation metrics for episodes from the playlists above, please, open the video in a separate tab and check the video description.

References


  1. Santhosh K. Ramakrishnan, Aaron Gokaslan, Erik Wijmans, Oleksandr Maksymets, Alexander Clegg, John Turner, EricUndersander, Wojciech Galuba, Andrew Westbury, Angel X. Chang, Manolis Savva, Yili Zhao, and Dhruv Batra. Habitat-matterport 3d dataset (hm3d): 1000 large-scale 3d environ-ments for embodied ai.arXiv preprint arXiv:2109.08238,2021
  2. Xiaoming Zhao, Harsh Agrawal, Dhruv Batra, and Alexan-der G. Schwing.The surprising effectiveness of visualodometry techniques for embodied pointgoal navigation. InProceedings of the IEEE/CVF International Conference onComputer Vision, pages 16127–16136, 2021