Visual Simultaneous Localization and Mapping (VSLAM)

A topic area for Specialization Projects and Masters Thesis

Visual SLAM is a term for a set of methods and algorithms that a) determine the motion of a camera (or a set of cameras) through an environment and b) determine the geometrical shape of that environment. vSLAM often builds on detecting “prominent points” in the images, and tracking them through the sequence. If a sufficient number of such points are tracked between two images, the relative pose (=translation and rotation) of the camera can be estimated. As any measurement in images is afflicted by errors, both these pose estimates as well as the estimated 3D positions of the observed image points are uncertain, and the estimation of the complete camera trajectory as well as the scene model “stitched together” from many views needs to be input data to a huge optimization problem.

In AROS, a NFR-funded project jointly pursued by several research groups at ITK, IDI, and IMT, we aim at providing the well-known underwater snake robot (a spectacular NTNU developments) the means for increased autonomy. Autonomy requires the capability to perceive a robot's environment. We have access to both real video footage from underwater missions, as well as a realistic simulation environment which is able to generate video sequences where the motion and the 3D geometry are precisely known (‘ground truth’). The student project is integrated into our design and development process for a vSLAM system which is specifically tuned to be able with the substantial problems of underwater video material: limited visibility due to turbid water, bad illumination which is also moving with the robot vehicle, disturbances by plankton, dirt, and small fish, and many more. Which part of the vSLAM development is determined to be the focus area of the student project is subject to negotiation; the intention is to let the students experiment with novel approaches proposed in the recent literature, some of them focusing on geometric models and statistical estimation theory, others on machine learning. So we are able to adapt the topic largely to the background knowledge the student(s) already have, and their interest into different relevant research fields, such as e.g. state estimation, optimization, object detection and tracking, machine learning and deep learning.

Topics from this domain are available both to students from IDI as well as ITK.

Potential focus topics:

Robust keypoint tracking in the presence of underwater image degradation
Dynamic Model based prediction and correcting in keypoint and object tracking in underwater conditions
Pose graph and state sequence optimization for underwater visual SLAM
Integration of IMU measurements in underwater visual SLAM
Machine Learning for depth estimation, flow field estimation, and visual clutter detection

Literature:

D. Scaramuzza, F. Fraundorfer: Visual Odometry: Part I - The First 30 Years and Fundamentals. IEEE Robotics and Automation Magazine, 2011.
F. Fraundorfer, D. Scaramuzza: Visual odometry: Part II - Matching, robustness, optimization, and applications. IEEE Robotics and Automation Magazine, 2012.
Cesar Cadena, Luca Carlone, et al.: Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age. 2016
H. Zhan et al: DF-VO: What Should Be Learnt for Visual Odometry? 2021