Master's Thesis (Research Project) on Autonomous Vehicle Self Awareness using SLAM algorithms.
This research project is as part of the Master Thesis for the completion of my MSc in Computational and Software Techniques at Cranfield University.
It aims to provide an analysis of the extent to which Visual Odometry can achieve accurate position and orientation estimation of a vehicle in complex environments.
© Copyright 2023, All rights reserved to Hans Haller, CSTE-CIDA Student at Cranfield Uni. SATM, Cranfield, UK.
In recent years, SLAM (Simultaneous Localisation and Mapping) algorithms have revolutionized the realm of autonomous navigation, extending their influence from simple household robotic vacuum cleaners to urban self-driving cars and all the way to interplanetary exploration rovers. At the heart of SLAM lie the intertwined tasks of localisation and mapping.
This research delves specifically into the localisation aspect via Visual Odometry, unveiling a sophisticated Monocular approach, enhanced by state-of-the-art computational techniques. A notable enhancement is the seamless integration and comparison of advanced feature detection, ensuring more accurate and robust scene interpretation. Further refinement comes from the implementation of post-matching outlier filtering techniques, which drastically reduces inaccuracies arising from false matches.
Additionally, by harnessing the power of semantic segmentation, the system aims to efficiently categorize and differentiate distinct entities and features within the captured environment, enabling intricate scene masking for superior discernment. Depth perception, crucial for richer motion comprehension, is inferred by the computation of disparity maps through the use of stereo cameras.
Finally, the study introduces a frame-tiling optimisation method that seeks to elevate feature detection by distributing the process more uniformly across frames. This approach not only enhances motion estimation by ensuring an even distribution of features but also emphasizes the significance of dynamically adapting the technique. Furthermore, a comprehensive test is conducted to determine the optimal tile dimension combinations for each sequence, underscoring the adaptability imperative of the optimisation technique.
Utilising the KITTI dataset, which encompasses 11 labeled video sequences, precision, accuracy and consistency is assessed using the Absolute Trajectory Error (ATE) metric. Through rigorous experimentation and analytical methods, the study seeks to push the bound- aries of camera-based localisation and shape the future trajectory of SLAM algorithms, laying the foundation for the next generation of autonomous navigation systems.
The application was initially developed with Python 3.7 (3.7.17 to be exact), however in the late stages of development, a proper virtual environment was created under Python 3.9. In both cases, make sure you use respectively the requirements37.txt
or requirements39.txt
file to install the dependencies, depending on the version of Python you wish to use.
If Python 3.9 is used, you won't be able tu use SURF
as a feature detector, as it is not supported by OpenCV 4.5.4, the lowest version of OpenCV that supports Python 3.9, as it was patented by its creators. If you wish to use SURF
as a feature detector, please use Python 3.7.17.
Once the repository of the project is cloned, or downloaded from Cranfield's Canvas, create a virtual environment for the project to store all packages the app requires.
To create a virtual environment, first open your command prompt or terminal and navigate to the main directory:
cd ./irp-myvoslam
Once in the main directory, create a new virtual environment using the venv module in Python. The command to create a virtual environment is:
python3 -m venv venv
This will create a new directory called "venv" in the backend directory, which will contain all the necessary files for their virtual environment.
To activate the virtual environment, run the following command:
source venv/bin/activate
This will activate the virtual environment. You should see the name of the environment in your terminal prompt. Once the virtual environment is activated, you can install the dependencies required for the project using pip. The list of the required packages is listed in the requirements.txt file in the root of the directory backend.
To install the dependencies, run the following command:
pip3 install -r requirements39.txt --no-cache-dir --no-dependencies
(or pip3 install -r requirements37.txt --no-cache-dir --no-dependencies
if you are using Python 3.7.17)
This will install all the required packages for the project. Important note: It is mandatory to add the --no-cache-dir
and --no-dependencies
flags to the command, otherwise the installation will fail. For further explanation, please refer to the end of the README.md file.
To download the data and the models (mandatory in order to apply semantic segmentation), run the following command:
./scripts/curl-data-and-models.sh
or
./scripts/wget-data-and-models.sh
This will download the data from the KITTI dataset, create the src/data/input/kitti
and src/data/output/kitti
directories, and unzip the data in the input directory. It will also download the pre-trained models for the feature detectors and descriptors, and unzip them in the src/data/models
directory. You must have either curl
or wget
installed on your machine for this to work.
Once the virtual environment is now set up and ready to use, the user can start the application by running the app.py file through their IDE, or using the following command:
python3 main.py
If you have any questions, please contact me at hans.haller.885@cranfield.ac.uk
The application is designed to be modular, and to allow the user to run specific tasks. The following sections will describe how to run specific tasks.
To compute a pose estimation, simply run the following command:
python3 main.py
This will run the application with the default parameters, and will compute the pose estimation for the the sequences 4 and 5 of the KITTI dataset. The results will be saved in the src/data/output/kitti
directory. Results are not saved by default.
There are various parameters that the user may tweak, and they are free to do so. The following sections will describe what can be done:
The datasets_paths
list contains paths to the datasets you want to test on. By default, it includes paths to all the datasets labeled from "S0" to "S10" in the input_dir
.
datasets_paths = [
os.path.join(input_dir, dataset_index)
for dataset_index in
["S0", "S1", "S2", "S3", "S4", "S5", "S6", "S7", "S8", "S9", "S10"]
]
To test on specific datasets, simply modify this list to include only the desired dataset labels.
The method
parameter determines the type of visual odometry to be used (either "mono" or "stereo")
method = ["mono", "stereo"][0]
The user may choose to run the application with either the monocular or stereo visual odometry method.
The fd_parameters
dictionary contains parameters related to feature detection. You can set the feature detection method and the number of features to detect.
fd_parameters = {
"fda": ["surf", "fast", "orb"][2], # The feature detection method
"nfeatures": 3000 # The number of features to detect
}
fda
: This parameter determines the feature detection algorithm to be used. In the given example, "orb" is selected.nfeatures
: This parameter specifies the number of features to detect in the image.
By tweaking these parameters, users can experiment with different feature detection algorithms and the number of features to understand their impact on the visual odometry results.
Post Matching Outlier Removal is crucial for refining the matches obtained after feature detection and description. Adjusting the parameters in the pmor_parameters
dictionary allows you to control the outlier removal process:
pmor_parameters = {
"do_PMOR": True,
"do_xyMeanDist": True,
"do_xyImgDist": True,
"do_RANSAC": True,
}
do_PMOR
: This is the main switch. Set toTrue
to enable PMOR, andFalse
to disable it.do_xyMeanDist
: Toggle this to enable or disable the mean distance method for outlier removal.do_xyImgDist
: Toggle this to enable or disable the image dimension method for outlier removal.do_RANSAC
: Toggle this to enable or disable the RANSAC method for outlier removal.
Semantic Segmentation aids in understanding the scene by classifying each pixel into predefined categories. Adjust the parameters related to semantic segmentation in the ss_parameters
dictionary:
ss_parameters = {
"do_SS": True,
"model_path": "src/models/deeplabv3_xception65_ade20k.h5",
"features_to_ignore": ["sky", "person", "car"]
}
do_SS
: Toggle to enable or disable semantic segmentation.model_path
: Path to the pre-trained model used for semantic segmentation.features_to_ignore
: A list of features or objects that you want excluded from the final mask applied to the image.
Frame Tile Optimization is used to distribute features uniformly across the image frame. Adjust the parameters in the fto_parameters
dictionary:
fto_parameters = {
"do_FTO": True,
"grid_h": 40,
"grid_w": 20,
"patch_max_features": 10
}
do_FTO
: This is the main switch. Set toTrue
to enable FTO, andFalse
to disable it.grid_h
andgrid_w
: Define the grid size for the image frame. This determines how many tiles the image is divided into, horizontally and vertically.patch_max_features
: The maximum number of features you want to retain in each tile. This ensures a uniform distribution of features across the frame.
-
Visualization (
view
parameter, set toTrue
by default): Enables real-time visualization of the visual odometry process, showcasing keypoints detection, frame matching, and motion estimation. -
Monitoring (
monitor
parameter, set toTrue
by default): Uses thetqdm
library to display a progress bar, offering a real-time status of the computation's progress. -
Saving (
save
parameter, set toFalse
by default): Allows the results, specifically the Absolute Trajectory Error (ATE), to be saved into a.csv
file for further analysis.
Feel free to adjust these parameters as you see fit.
The application allows the user to run a bulk test of FTO grid combinations. This is useful to determine the optimal grid size for the image frame. To run a bulk test, simply run the following command:
python3 bulk.py
-
GRID_H_values & GRID_W_values: These lists define the various grid sizes you want to test. For instance,
GRID_H_values = [4, 8, 10]
means you'll test with 4, 8, and 10 tiles in the horizontal direction. -
PATCH_MAX_FEATURES: This parameter ensures the number of features never exceeds a certain threshold. It's set to 10 by default, meaning each tile will have a maximum of 10 features.
-
datasets_paths: This list contains paths to the datasets you want to test on.
-
fd_parameters: This dictionary contains parameters related to feature detection. You can set the feature detection method and the number of features to detect.
If you wish to test with different parameters, modify the appropriate variables in the bulk.py
script. For instance, to test with different grid sizes, simply modify the GRID_H_values
and GRID_W_values
lists.
After adjusting the parameters as desired, run the script again:
python3 bulk.py
The results will be saved in the src/data/output/kitti
directory. Three files will be generated:
- BULKFTO_[timestamp].csv: Contains the raw results for each dataset and grid combination.
- BULKFTO_ATE_COMPARISON_[timestamp].csv: Contains the Absolute Trajectory Error (ATE) comparison.
- BULKFTO_NCATE_COMPARISON_[timestamp].csv: Contains the Normalized Cumulative ATE comparison.
For further assistance or inquiries, don't hesitate to reach out to me at hans.haller.885@cranfield.ac.uk
, or create a new issue on the repository.
Happy experimenting!