August 4, 2021

Computer image recognition takes a giant leap forward

In 2012, a team from the University of Toronto (Canada) surprised the world during a computer image recognition competition: only 15% errors for its software against 26% for the second. It was the start of the new wave of artificial intelligence, known as “deep learning” or deep learning, because the program, related to a network of connected artificial neurons, finds the right “connections” by training on millions of examples.

Then the wave spread to games (go, chess, poker), cars (autonomous driving), voice (in voice assistants), science (forms proteins)… But, stars of the early days, the images saw the following trains pass by progress, with performances which leveled off. Until the last few months.

“I have to say that I haven’t been so excited in this area for ten or twenty years! “, explained Yann LeCun, scientific manager at Facebook and pioneer of deep learning for thirty years, during a presentation to the press on June 30 of the latest research advances of the Californian giant. “It’s going very fast. Two years ago, there was nothing new ”, confirms Matthieu Cord, professor at Sorbonne University and researcher at Valeo.

Read also How deep learning is revolutionizing artificial intelligence

The change is linked to several innovations making it possible to correct the defects of the first methods. “The key to the success of early techniques is what is called supervised learning. That is, the program learns its parameters, thanks to data annotated by humans », specifies Jean Ponce, computer science professor at the Ecole normale supérieure. To “recognize” a cat, dog or car, thousands of images labeled “cat” or “dog” or “car” are shown to the program, which adapts its parameters to find the correct answer. Then, even on unknown images, it gives the correct answer.

Self-supervised learning

The main problem is that the technique requires a lot of captioned images. In addition, the diversity of real situations is such that it is impossible to represent it with databases of images validated by small hands. “The performance of the vision systems of autonomous cars collapses if we show images at night or of wet dogs”, notes Matthieu Cord.

You have 66.3% of this article to read. The rest is for subscribers only.