Learn how SLAM is paving way for Augmented Reality to excel in the coming times and becoming a
game-changer for the Industry!
— By Saumyashree Singhal, SLAM Researcher @ Sally Robotics.
Augmented Reality (AR) is an amazing developing technology which is not only being used for amplifying the gaming experience but is also finding immense potential in various industries to increase their productivity. You probably must have used Snapchat AR lens placing virtual avatars around your camera view! Capturing Pokemons in the Pokemon Go app is another great example of the applications of Augmented Reality.
Smartphones are getting supplemented with advanced hardware power rapidly, which gives them a lot of power to perform tasks to combine digital and real environment. However, there still are many challenges such as uncontrolled camera and real-time processing, which cannot be addressed by hardware advancements alone. SLAM, Simultaneous Localization and Mapping, is a technology which makes it possible to realize Augmented Reality at an advanced level. Let’s dive deep into understanding what SLAM really is.
What Is SLAM?
SLAM (Simultaneous Localization and Mapping) is a method that lets us build a map of our surroundings and localize our device in that environment at the same time. SLAM algorithms allows us to map out unknown environments too.
Visual SLAM (or vSLAM), which is a special variant of SLAM, uses images acquired from cameras and other image sensors. Visual SLAM can use simple cameras (such as wide-angle, fish-eye, and spherical cameras), compound eye cameras (such as stereo and multi-cameras), and RGB-D cameras (such as depth and ToF cameras). Visual SLAM can be implemented at low cost with relatively inexpensive cameras. In addition, since cameras provide a large volume of information, they can be used to detect landmarks (which are previously measured positions). Landmark detection can also be combined with graph-based optimization, achieving flexibility in SLAM implementation
There are a lot of Visual SLAM (VSLAM) techniques with multiple applications in Robotics, Self-Driving Cars, Drones and AR technologies. It has proven to be highly effective in providing the ability to sense the location of a camera, as well as the environment around it, without knowing either data points beforehand.
Here is a demonstration of the working of RGB-D SLAM:
How does it work?
A general VSLAM framework looks like this :
Generally, there are two methods used to obtain the motion relationship between adjacent frames: (1) The direct method and (2) The feature point method. The outputs of each module, after information processing, are fused. As the noise in either sensor affects the other, there are complex calculations required. Optimization deals with noise in the SLAM process. The back-end receives camera position information, measured by the visual odometer, at a different time and also loop closing information. Optimization then determines a globally consistent optimal solution to generate a unified trajectory and map. It is thus able to estimate the state of the whole system, from the noisy data, and optimize the initially calculated results.
This is just a part of the various processes that take place in the back-end of the SLAM systems. So, it is quite evident that SLAM is a complex system and one has to be quite specific in using the type of method which would bring the best result for a particular application.
We compare different VSLAM Methods that are being used for AR application from a recent research paper [A Review of VSLAM Technology Applied in Augmented Reality].
Comparing Visual SLAM Methods for AR
VSLAM programs : Methods such as Classical — (MonoSLAM) MonocularSLAM, (PTAM) Parallel Tracking and Mapping, (LSD-SLAM) Large-Scale Direct Monocular SLAM, (ORB-SLAM) Oriented Brief SLAM and RGBD-SLAM are compared and discussed for the application in the paper. In its summary it states:
“ VSLAM technology based on monocular camera lacks scene depth information, and needs to estimate the depth of pixels by triangulation or the inverse depth method. With a binocular camera, VSLAM obtains feature points by image matching between left and right cameras, and then estimates depth information by parallax.”
We can clearly conclude the clear advantage of the binocular camera over monocular camera. Moreover, the paper also discusses that for AR application RGB-D Slam is a popular preferable program. However, RGBD-SLAM still has a number of problems that need to be addressed, such as:
Too fast camera movement
No occlusion of the visual field
Lack of features
Light source interference
These are some big challenges the general use involves fast camera movements and varying light sources, and these issues are being tackled with improved methods and giving better results.
Challenges with SLAM
1. Error Accumulation
As mentioned above errors could accumulate and can deviate from actual values which can also cause map data to collapse or distort, making subsequent searches difficult. Some errors are unavoidable hence one of the solutions possible is some type of error is reduced by using better optimization methods.
2. Computational Cost
Computation is usually performed on compact and low-energy embedded microprocessors that have limited processing power. To achieve accurate localization, it is essential to execute image processing and point cloud matching at high frequency
One countermeasure is to run different processes in parallel. Processes such as feature extraction, which is preprocessing of the matching process, is relatively suitable for parallelization. Using multicore CPUs for processing, single instruction multiple data (SIMD) calculation, and embedded GPUs can further improve speeds in some cases. (source: Mathworks)
SLAM Technology Market
A lot of tech giants like Apple, Google, Facebook and Amazon have been working on SLAM for Commercial, Household, Manufacturing and Logistics, Military. Due to COVID-19, SLAM Market also took a hit due to supply chain and market disturbances and many corporations running into financial trouble.
But there is rapidly increasing future demand of SLAM Technologies for augmented reality (AR) applications; new digital technologies such as automation and Artificial Intelligence and service robots for domestic applications.
SLAM gives the device the ability to not only just determine the environment but also interact or relate with it in real-time. Which makes it a crucial system for various technologies like autonomous vehicles, robotics and Augmented Reality. SLAM can take a variety of sensors as input and fuse it to make a map, the most common sensor is a camera and for application of AR. Thus VSLAM is used which takes the only camera as an input sensor.
SLAM is a complex system with a variety of methods present and developing, a recent research paper concluded that RGB-D SLAM method gives better result among other comparisons of SLAM Methods. SLAM still faces multiple challenges like error accumulation and high computational costs. And a challenge of using it in AR Applications is that Camera used is in freehand and variable lighting which makes creates a challenge to create right predictions from the data.
Lastly, SLAM has large capabilities and demands and a huge growing market which did face disturbances due to COVID-19 pandemic. However, even with those disturbances and technical complexity and issues, development on SLAM is very active in recent years and issues are being constantly solved. The technology is evolving and reducing complexity and even in this pandemic period the development and market keep increasing, sooner we will have improved SLAM techniques which will advance robotics, autonomous vehicles and Augmented Reality as well into the next era of the technological age.
And maybe would have something precisely like this 😬: (Just with AR instead of hologram)
A Review of VSLAM Technology Applied in Augmented Reality: https://www.researchgate.net/publication/340658515_A_Review_of_VSLAM_Technology_Applied_in_Augmented_Reality