BEVFusion: Combine multi-sensor multi-tasking with unified bird’s-eye view representation

news7g05/28/2022

26 1 minute read

LiDAR camera mounted to the top of a vehicle. Image credit: Oregon Department of Transportation via Flickr, CC BY 2.0

Self-driving cars are equipped with various sensors that provide additional information: cameras record semantic information, radars provide velocity estimates, and LiDAR provide spatial information. To get the correct perception, it is necessary to find a uniform representation suitable for multi-tasking multi-method feature combination.

The LiDAR camera is attached to the front of the vehicle. Image credit: Oregon Department of Transportation via FlickrCC BY 2.0

A recent paper on arXiv.org proposes BEVFusion to merge multimodal features in a shared bird view representation (BEV) space for task agnostic learning. This method allows maintaining both geometrical structure and semantic density and naturally supports most 3D perception tasks.

New modern performance posing approach. In terms of 3D object detection, it ranks 1st on nuScenes benchmark ranking among all solutions that don’t use increased test time and model pool. It also shows significant improvements in BEV map segmentation.

Multi-sensor integration is essential for an accurate and reliable autonomous driving system. Recent approaches are based on point-level matching: LiDAR point cloud enhancement with camera features. However, camera-to-LiDAR projection loses the semantic density of camera features, hindering the effectiveness of such methods, especially for semantically-oriented tasks (such as segmentation) 3D scene). In this paper, we break this deeply rooted convention with BEVFusion, an efficient and generic multi-sensor multi-sensor fusion framework. It unifies multimodal features in the shared bird view representation (BEV) space, which uniquely preserves both geometric and semantic information. To this end, we diagnose and remove key performance bottlenecks in view transitions with optimized BEV aggregation, which reduces latency by more than 40x. BEVFusion is essentially task agnostic and seamlessly supports various 3D perception tasks with almost no architectural changes. It sets new state of the art on nuScenes, achieving 1.3% higher mAP and NDS on 3D object detection and 13.6% higher mIoU on BEV map segmentation, with 1.3% lower computational cost, 9 times.

Research articles: Liu, Z., “BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird’s-Eye View University”, 2022. Link: https://arxiv.org/abs/2205.13542
Project page: https://bevfusion.mit.edu/

Source link

news7g05/28/2022

26 1 minute read