Wednesday, May 25, 2016

“Deep” Visual Understanding from “Deep” Learning

Earlier this year, in March, something very significant happened in the history of artificial intelligence. A computer program, AlphaGo, developed by Google DeepMind, defeated the South Korean professional Go player, Lee Sedol, in the home turf of the human player. The collective human ego went into a shock. It was unthinkable that a machine could beat the best of human minds in this extremely complex strategy game! The world had hardly recovered from the defeat handed to Gary Kasparov, the formidable world chess champion, by a mere computer, the IBM Deep Blue, in 1997.

Well… we should have seen this coming. The processing power of computers has gone through astronomical advances, thanks to the relentless pursuit of Moore’s law, named after the founder of Intel. Basically, the transistor count in the processor chip has been doubling every two years over the past four decades! In parallel, algorithms for machine learning and artificial intelligence also went through revolutionary leaps, with the invention and enhancements of the convolutional neural network approach. The combination of the advances in these two fields is now enabling previously unthinkable computer vision and machine intelligence capabilities.

Appropriately, SID invited Professor Jitendra Malik of UC Berkeley, a pioneer in the field of computer vision, to present the Luncheon Keynote this year. He started his presentation by showing the picture below. 

Can a computer program be developed that understands the “semantics” of this scene? Facts such as: 1) the lady on the left is walking away with 3 bags, 2) the woman on the right is playing the accordion sitting on the bench by a bag, while 3) the guy in the middle is looking at the woman. Pushing it further, Professor Malik asked if it would then be possible to predict if the guy has the intention to put some money in the tip bag of the woman!

Most would probably say these are impossible tasks for a computer to accomplish. However, Professor Malik walked the audience through the advances in computer vision over the past few decades, and demonstrated how the advent of multi-layer neural network based algorithms have resulted in unprecedented accuracies in semantic visual understanding that would make such tasks possible.

On to “deep” understanding of the visual world from a collection of pixels on an image, with the help of “deep” learning algorithms running on powerful modern computers! The future is intelligent, and the future is already here… --Achin Bhowmik

No comments:

Post a Comment