Artificial Intelligence (AI) makes it possible to transfer human cognitive skills to machines or semi-autonomous systems. Machine Learning (ML) is able to generate this intelligence with the help of computer systems, applying large amounts of data plus a high level of computational power and teaching the AI to formulate and solve problems independently. This involves a learning process and a prediction process. The computer system learns autonomous task solving through repeated execution, using deep neural networks that apply multiple layers between input and output for inferred prediction. Beneficiaries of the new technology have so far included autonomous robots, self-driving cars, and medical as well as industry 4.0 applications.
For object recognition, the underlying neural networks need to be trained – which constitutes a major challenge, as training data often has to be generated manually in a time-consuming and costly way. For this purpose, objects are photographed and the resulting images are subsequently annotated by the assignment of clusters to the objects, pixel by pixel. For many use cases, this approach is uneconomical. Another option is to use synthetically generated data, where manual photography and labeling is not necessary, as training images can be generated automatically from arbitrary perspectives (photography step) and the exact object poses are known (labeling step).
Fraunhofer IGD offers 3D model-based pipelines and workfl ows for training deep neural networks to detect objects and/or reconstitute their 6D pose from a 2D image. For this purpose, synthetically generated data derived from CAD models is employed and domain matching problems are solved by means of photorealistic combined with non-photorealistic rendering techniques – all at low cost. Style Transfer Networks (CGANs) or neural rendering techniques are used to generate arbitrary randomized image data for the subsequent task of object recognition and 6D pose estimation. The system can thus be trained to recognize any particular product detail within a complex product configuration or variant without having previously captured it in a real-world context. Possible applications include automated vision-based quality control systems, autonomous robots for bin picking in manufacturing environments and automated catalog retrieval in customer service.