Whilst the availability of 3D LiDAR point cloud data has significantly grown in recent years, annotation remains expensive and time-consuming, leading to a demand for semi-supervised semantic segmentation methods with application domains such as autonomous driving. Existing work very often employs relatively large segmentation backbone networks to improve segmentation accuracy, at the expense of computational costs. In addition, many use uniform sampling to reduce ground truth data requirements for learning needed, often resulting in sub-optimal performance. To address these issues, we propose a new pipeline that employs a smaller architecture, requiring fewer ground-truth annotations to achieve superior segmentation accuracy compared to contemporary approaches. This is facilitated via a novel Sparse Depthwise Separable Convolution module that significantly reduces the network parameter count while retaining overall task performance. To effectively sub-sample our training data, we propose a new Spatio-Temporal Redundant Frame Downsampling (ST-RFD) method that leverages knowledge of sensor motion within the environment to extract a more diverse subset of training data frame samples. To leverage the use of limited annotated data samples, we further propose a soft pseudo-label method informed by LiDAR reflectivity. Our method outperforms contemporary semi-supervised work in terms of mIoU, using less labeled data, on the SemanticKITTI (59.5@5%) and ScribbleKITTI (58.1@5%) benchmark datasets, based on a 2.3× reduction in model parameters and 641× fewer multiply-add operations whilst also demonstrating significant performance improvement on limited training data (i.e., Less is More).
@inproceedings{li2023less,title={Less Is {{More}}: {{Reducing Task}} and {{Model Complexity}} for {{3D Point Cloud Semantic Segmentation}}},author={Li, Li and Shum, Hubert P. H. and Breckon, Toby P.},keywords={point cloud, semantic segmentation, sparse convolution, depthwise separable convolution, autonomous driving},year={2023},video={https://www.bilibili.com/video/BV1ih4y1b7gp/?vd_source=cc0410bc3f69236950fa663b082e6754},month=jun,publisher={{IEEE}},booktitle={Conference on Computer Vision and Pattern Recognition (CVPR)},}
We present DurLAR, a high-fidelity 128-channel 3D LiDAR dataset with panoramic ambient (near infrared) andreflectivity imagery, as well as a sample benchmark usingdepth estimation for autonomous driving applications. Ourdriving platform is equipped with a high resolution 128channel LiDAR, a 2MPix stereo camera, a lux meter anda GNSS/INS system. Ambient and reflectivity images aremade available along with the LiDAR point clouds to facilitate multi-modal use of concurrent ambient and reflectivityscene information. Leveraging DurLAR, with a resolutionexceeding that of prior benchmarks, we consider the task ofmonocular depth estimation and use this increased availability of higher resolution, yet sparse ground truth scenedepth information to propose a novel joint supervised/self-supervised loss formulation. We compare performance overboth our new DurLAR dataset, the established KITTI benchmark and the Cityscapes dataset. Our evaluation shows ourjoint use supervised and self-supervised loss terms, enabledvia the superior ground truth resolution and availabilitywithin DurLAR improves the quantitative and qualitativeperformance of leading contemporary monocular depth es-timation approaches (RMSE= 3.639,SqRel= 0.936).
@inproceedings{li21durlar,author={Li, Li and Ismail, K.N. and Shum, Hubert P. H. and Breckon, Toby P.},title={DurLAR: A High-Fidelity 128-Channel LiDAR Dataset with Panoramic Ambient and Reflectivity Imagery for Multi-Modal Autonomous Driving Applications},booktitle={International Conference on 3D Vision (3DV)},year={2021},month=dec,publisher={IEEE},keywords={autonomous driving, dataset, high resolution LiDAR, flash LiDAR, ground truth depth, dense depth, monocular depth estimation, stereo vision, 3D},category={automotive 3Dvision},video={https://www.youtube.com/watch?v=1IAC9RbNYjY},}