DFKI Presents New Technology

Machine pattern recognition

March 26, 2024

By Martin Ciupek

Reading time: approx. 3 minutes

Researchers at DFKI have developed a solution so that robots can better evaluate objects in context in the future. It will soon be presented in detail at a conference in the USA.

Robots equipped with numerous sensors already check the quality of components in automobile construction. In the future, the sensor data will also help machines to correctly interpret their environment like humans.
Photo: panthermedia.net/ Mihajlo Maricic

Robots often have difficulty in everyday human environments. The reason: People evaluate what they see in a context. For example, if you see a full glass, you assume that liquid could spill over if you move roughly. So that machines can also learn to orient themselves visually in our living world, scientists at the German Research Center for Artificial Intelligence (DFKI) have developed the Multi-Key Anchor & Scene-Aware Transformer for 3D Visual Grounding (MiKASA). Using machine pattern recognition, it makes it possible to identify and semantically understand complex spatial dependencies and features of objects in three-dimensional space. At this year’s Conference on Computer Vision and Pattern Recognition (CVPR) in Seattle, USA, researchers from the Augmented Vision department are now presenting this solution.

Learning robots: Pattern recognition creates context for objects

People learn to put things into context when they learn language. This goes beyond pure language and helps, for example. B. to understand an intention or reference and to connect it with an object in our living environment. Robots do not yet have these skills, but they can learn them. There are research approaches around the world, but so far they have only come close to human capabilities.

Reading tip: Humanoid robot from Figure AI shows natural interaction thanks to ChatGPT

That is currently changing and the DFKI wants to play an important role with MiKASA. According to the researchers, thanks to a “scene-aware object recognizer”, machines can now draw conclusions from the surroundings of a reference object – and thus accurately recognize the object and define it correctly. So they evaluate things depending on the context and thus gain a nuanced understanding of their surroundings.

A question of perspective: Pitfalls in locating objects

However, the pitfalls lie in the details, for example when it comes to understanding relative spatial dependencies. “The chair in front of the blue monitor” can be “the chair behind the monitor” from another perspective. To make it clear to the machine that both chairs are actually one and the same object, the program works with a so-called “multi-key anchor concept”. This transmits the coordinates of anchor points in the field of view in relation to the target object and evaluates the importance of nearby objects based on text descriptions.

Semantic references can then help to localize the object. Typically, a chair is placed toward a table or has its back against a wall. The presence of a table or a wall indirectly defines the orientation of the chair.

Hit rate for object detection increased by 10%

According to the DFKI, MiKASA achieves an accuracy of up to 78.6% by linking language models, learned semantics and the recognition of objects in real three-dimensional space. Compared to the best previous technology in this area, the hit rate for object detection could be increased by around 10%.

Numerous sensors provide the robot with visual information about the environment, whose data is combined to create an overall impression. As with human eyes, visual areas overlap. In order to generate a coherent picture from the large amount of data, the researchers at DFKI have developed the “SG-PGM” (Partial Graph Matching Network with Semantic Geometric Fusion for 3D Scene Graph Alignment and Its Downstream Tasks). The alignment between so-called three-dimensional scene visualizations (3D scene graphs) is the basis for a variety of applications. For example, it supports “point cloud registration,” in which points from different laser scanners are merged. This helps robots navigate.

Also read: Why the household robot Stretch 3 could become an alternative to humanoid robots

To ensure that this is possible even in dynamic environments with possible sources of interference, SG-PGM links the visualizations with a neural network. The program reuses the geometric elements learned through registration and associates the grouped geometric point data with the semantic features at the node level.

Source link

4.1/5 - (11 votes)

DFKI presents new technology

For Siemens Factory Automation CEO Brehm, generative AI is not a universal solution

Nothing stands in the way of billionaire Kretinsky’s entry

Germany and France European champions in innovations for clean technologies

Learning robots: Pattern recognition creates context for objects

A question of perspective: Pitfalls in locating objects

Related

Leave a ReplyCancel reply

Latest News

For Siemens Factory Automation CEO Brehm, generative AI is not a universal solution

More Articles Like This