Science Gazette

Using 2D visuals, a technique increases AI’s capacity to perceive 3D space


Using two-dimensional (2D) photographs, researchers have created a novel approach called MonoCon that increases the capacity of artificial intelligence (AI) algorithms to detect three-dimensional (3D) objects and how they connect to each other in space. For example, utilizing the 2D pictures received from an onboard camera, the research might aid the AI used in autonomous cars in navigating in respect to other vehicles.

“We live in a 3D world, but when you take a photo, it preserves that world in a 2D image,” explains Tianfu Wu, an associate professor of electrical and computer engineering at North Carolina State University and corresponding author of a paper on the topic.

“Cameras provide visual input to AI systems. So, if we want AI to engage with the outside world, we need to make sure it can decipher what 2D pictures can tell it about 3D space. We’re focusing on one aspect of that problem in our study: how to enable AI to reliably detect 3D things in 2D photos, such as people or automobiles, and put them in space.”

While the research is crucial for driverless cars, it also has manufacturing and robotics applications.

In the context of autonomous cars, most contemporary systems depend on lidar to traverse 3D space, which employs lasers to detect distance. Lidar technology, on the other hand, is costly. Due of the high cost of lidar, autonomous systems do not have much redundancy. Putting hundreds of lidar sensors on a mass-produced autonomous automobile, for example, would be prohibitively costly.

“However, if an autonomous car could drive across space using visual inputs, you could build in redundancy,” Wu explains. “Because cameras are substantially less costly than lidar, adding more cameras to the system would be cost-effective, adding redundancy and making it both safer and more robust.

See also  In an Alzheimer's model, fecal implants cause behavioral and cognitive abnormalities

“That is one example of a practical use. However, we’re equally ecstatic with the work’s core breakthrough: the ability to extract 3D data from 2D objects.”

MonoCon can recognize 3D items in 2D photos and place them in a “bounding box,” which basically informs the AI where the relevant object’s furthest borders are.

MonoCon draws on a large body of previous work targeted at assisting AI algorithms in extracting 3D data from 2D photos. Many of these approaches include “showing” the AI 2D pictures and creating 3D bounding boxes around the image’s objects. These boxes are cuboids, which contain eight points (think of a shoebox’s corners). During training, the AI is given 3D coordinates for each of the box’s eight corners, allowing it to “understand” the “bounding box’s” height, breadth, and length, as well as the distance between each of those corners and the camera. The AI is taught how to estimate the size of each bounding box and how to forecast the distance between the camera and the automobile using this method. The trainers “correct” the AI after each prediction by providing it the right responses. This enables the AI to improve its ability to recognize things, place them in a bounding box, and estimate their dimensions over time.

“What makes our work distinct is the way we train the AI,” Wu explains, “which builds on past training methodologies.” “Similarly to past attempts, we train the AI by placing items in 3D bounding boxes. We also ask the AI to forecast the positions of each of the box’s eight points and their distance from the center of the bounding box in two dimensions, in addition to the camera-to-object distance and the bounding box’s dimensions. This is referred to as supplementary context, and it has been shown that it aids AI in more correctly identifying and predicting 3D things based on 2D photos.

See also  A microscopic look at asteroid crashes might aid in our understanding of planet creation

“The Cramér-Wold theorem, a well-known theorem in measure theory, motivates the suggested technique. It might also be used in computer vision for other structured-output prediction problems.”

MonoCon was put to the test using KITTI, a well recognized benchmark data set.

“MonoCon fared better than any of the dozens of previous AI methods aiming at extracting 3D data about autos from 2D photos at the time we submitted our work,” Wu explains. MonoCon did a good job at distinguishing walkers and bicycles, but it wasn’t the best AI software for the job.

“Moving ahead, we’re going to scale this up and working with bigger datasets to analyze and fine-tune MonoCon for autonomous driving,” Wu adds. “We also want to look at industrial applications to see whether we can increase the execution of jobs like using robotic arms.”

The National Science Foundation, via grants 1909644, 1822477, 2024688, and 2013451, the Army Research Office, through grant W911NF1810295, and the United States Department of Health and Human Services, Administration for Community Living, through grant 90IFDV0017-01-00, supported the research.

Leave a comment

You must be logged in to post a comment.