BEVFusion: Combine multi-sensor multi-tasking with unified bird’s-eye view representation

Self-driving cars are equipped with various sensors that provide additional information: cameras record semantic information, radars provide velocity estimates, and LiDAR provide spatial information. To get the correct perception, it is necessary to find a uniform representation suitable for multi-tasking multi-method feature combination.

The LiDAR camera is attached to the front of the vehicle.  Image credit: Oregon Department of Transportation via Flickr, CC BY 2.0

The LiDAR camera is attached to the front of the vehicle. Image credit: Oregon Department of Transportation via FlickrCC BY 2.0

A recent paper on proposes BEVFusion to merge multimodal features in a shared bird view representation (BEV) space for task agnostic learning. This method allows maintaining both geometrical structure and semantic density and naturally supports most 3D perception tasks.

New modern performance posing approach. In terms of 3D object detection, it ranks 1st on nuScenes benchmark ranking among all solutions that don’t use increased test time and model pool. It also shows significant improvements in BEV map segmentation.

Multi-sensor integration is essential for an accurate and reliable autonomous driving system. Recent approaches are based on point-level matching: LiDAR point cloud enhancement with camera features. However, camera-to-LiDAR projection loses the semantic density of camera features, hindering the effectiveness of such methods, especially for semantically-oriented tasks (such as segmentation) 3D scene). In this paper, we break this deeply rooted convention with BEVFusion, an efficient and generic multi-sensor multi-sensor fusion framework. It unifies multimodal features in the shared bird view representation (BEV) space, which uniquely preserves both geometric and semantic information. To this end, we diagnose and remove key performance bottlenecks in view transitions with optimized BEV aggregation, which reduces latency by more than 40x. BEVFusion is essentially task agnostic and seamlessly supports various 3D perception tasks with almost no architectural changes. It sets new state of the art on nuScenes, achieving 1.3% higher mAP and NDS on 3D object detection and 13.6% higher mIoU on BEV map segmentation, with 1.3% lower computational cost, 9 times.

Research articles: Liu, Z., “BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird’s-Eye View University”, 2022. Link:
Project page:

Source link


News7g: Update the world's latest breaking news online of the day, breaking news, politics, society today, international mainstream news .Updated news 24/7: Entertainment, the World everyday world. Hot news, images, video clips that are updated quickly and reliably

Related Articles

Back to top button