Simulation visulation

Links to simulated point clouds and the 3D modelisation of the boreal forest (Picea Mariana and Abies balsamea) :

Sub 167 Plot 6853.

PhD Subject

Context

Canada is covered with 38% of forests which represents 8% of the world's forest cover. However, this resource can only be considered renewable through precise management. Therefore, it is vital to provide means of monitoring the state of this resource to forest actors: ecologists, managers, and developers.

While Canadian wilderness areas prohibit the use of land-based acquisition tools, which are too limited in coverage, airborne LiDAR technology is particularly well-suited in this context. It provides detailed geometric information of the ground and vegetation cover.

Unfortunately, the 3D point clouds generated during airborne acquisitions are raw sensor products that are unusable in their current form. These point clouds require preprocessing to be converted into intelligible and useful data for forest actors. However, the enormous amount of data generated during acquisitions makes any manual preprocessing operation impossible. It is in this context that artificial intelligence, specifically its most powerful branch, deep learning, can improve this conversion. Indeed, the latest algorithms offer a semantic power that would enable the detection of plant species directly in the point cloud.

The main drawback of these algorithms is the amount of training data required to achieve the desired performance. Unfortunately, creating this training data is extremely time-consuming and expensive, especially in the forest environment, where structural complexity is greater than in urban areas. That is why solving this lack of data problem is a crucial objective for the forestry world.

Objectives

This project aims to eliminate the need for training data required by deep learning algorithms in the context of airborne LiDAR acquisitions in a forest environment. The hypothesis is that computer simulation of a LiDAR acquisition on a virtual forest environment will generate this missing data.

The literature review we conducted shows that (1) computer simulation is one of the two preferred directions for research in resolving our problem, and (2) previous studies have validated our research hypothesis in different contexts such as optical flow or scene flow detection, point cloud classification, and autonomous driving learning.

Therefore, the main objective of this thesis project is to demonstrate the relevance of simulation in automating the generation of training data in the context of airborne LiDAR acquisitions in a forest environment.

Methodology

The general methodology of the thesis has been divided into three axes:

Hypothesis validation : the first axis aims to set up all the specific objectives (simulation, training, and validation) in the case of a simple forest composition. It will allow us to validate our research hypothesis.
Understanding: This axis should allow us to understand all the mechanisms impacting the performance of learning from simulated data. We will study the importance of simulation parameters (acquisition density, overall accuracy, beam resolution, simulated LiDAR, etc.) and modeling parameters (type of tree model simplification, forest structure, spectral values of the models, etc.). This study will be conducted in an ablative form, where each parameter will be independently tested against a reference simulation.
Generalization: The last axis will focus on the complexification of studied forest compositions, and will validate our hypothesis for several other types of Quebec forests:

Closed boreal forest: Abies balsamea, Picea mariana, Betula papyrifera, Populus tremuloides
Mixed forest: Betula alleghaniensis, Abies balsamea, Acer saccharum, Acer rubrum, Fagus grandifolia
Deciduous forest with: Acer saccharum, Acer rubrum, Fagus grandifolia, Betula alleghaniensis.

In parallel, using the data from the second axis, we will publish a benchmark for semantic segmentation of airborne LiDAR point clouds in forest environments.

Our current work has focused on the first axis. We have identified black spruce-moss forest as a simplistic forest composition. Composed of only two main species: Picea Mariana and Abies Balsamea, it simplifies the implementation of our processing chain. Regarding its modeling, we use computer models of these two species from scientific work by Jean-Francois Cote. We use the HELIOS++ simulation software since it has been specifically designed for forest environments. Regarding deep learning algorithms, the literature review has identified the most performant approaches. However, for comparison purposes, classical approaches will also be tested. Specifically, we have selected the works of Hugues Thomas, "KPconv," and Charles Qi, "PointNet++" as, respectively, state-of-the-art and classical approaches.

All computer work for simulation and deep learning training is parallelized on Calculs-Canada servers.

Expected results

The literature review has shown that no simulation work using LiDAR for creating training data in a forest environment for deep learning algorithms has been carried out yet. These works will be the first to propose such use. These works will allow testing the influence of LiDAR acquisition parameters on the results of deep learning algorithms according to detection objectives. Thus, it will be possible to formalize a set of LiDAR parameter recommendations to improve the detection of forest objects.

The test bench resulting from the first and second axes will be made public and accessible to the entire scientific community. A dedicated website for its distribution will be set up to facilitate its dissemination.

We have planned to publish four scientific articles, resulting from our three research axes:

Article 1: Creation of virtual DL LiDAR ALS training data for the boreal forest environment with HELIOS++.
Article 2: Study of the impact of virtual LiDAR ALS data simulation parameters on the quality of semantic segmentation in a forest environment.
Article 3: Panoptic segmentation and object detection by DL in the boreal forest from simulated LiDAR ALS data.
Article 4: Virtual ALS benchmark for semantic, instance, and panoptic segmentation in the boreal forest environment.

Expected scientific benefits

In the field of computer vision, the most important and impactful works are those that facilitate access to training data, such as manual labeling works for the creation of benchmark datasets. These benchmarks provide a framework for researchers to compare their methods. Therefore, making available a benchmark dataset for forest environment classification or semantic segmentation will allow researchers to test and compare the performance of their algorithms on this environment, which is currently non-existent. This will promote algorithmic methods specific to forest environments and ultimately improve machine learning performance in this domain.

Moreover, the popularity of universities is greatly affected by their ability to provide benchmark datasets. Thus, the publication of our forest-specific benchmark dataset will increase the visibility of the Department of Geomatics and the faculty in the scientific community. It will highlight the quality of work coming from university teams.

Finally, in the industrial and operational domains, our work will allow testing of minimal LiDAR acquisition calibrations to ensure efficient semantic segmentation by deep learning algorithms. Therefore, all organizations such as the MFFP and companies working on forest monitoring can update their acquisition methodologies according to recommendations issued by our work.