Deep Learning is a major branch of Machine Learning we have heard so much about.With or without our knowledge, deep learning is strategically influencing our day-to-day decisions.
The past few years have witnessed a tsunami of news in terms of quantum leaps in the quality of a wide range of everyday technologies for readers across the globe. Today, undoubtedly, the speech-recognition functions on our smartphones work like a charm, much better than they used to previously. All we need to do is use a voice command to call anyone from our contact list. A simple voice command today plays our favourite song. A commanding question to our smartphones can give us a real-time weather update within seconds.
What does this tell us?
We are increasingly interacting with our smartphones, the computers that run them by simply ‘talking’ to them. Did we ever think a decade back that one day we will be ‘commanding’ our cell phones? Be it Amazon’s Alexa, Apple’s Siri, Microsoft’s Cortana, or the multifold voice-responsive features of Google.
Data says that customers have tripled their use of its speech interfaces in the past 18 months.
The Dive into Smarter Machines
We are living in a time when machine translation and various other aspects of language processing have become extremely convincing. Companies like Google, Microsoft, Facebook, and Baidu are pulling out new tricks out of their sleeve every few months.
For instance, Google Translate today renders sentences that are spoken in a particular language into spoken sentences in another for 32 pairs of languages. It also offers text translations for 103 tongues, including Cebuano, Igbo, and Zulu. And as all of us might be aware, the Gmail app from Google offers three readymade replies for most of the incoming emails. This cannot become possible without the underlying machines having the capability to read and understand the context and the content of the email.
The next big leap into the world of AI and machine learning can bee seen by the current advances in the world of image recognition. Those tech giants have
Features embedded into their systems which let you search or automatically organize collections of photos with no recognizable tags.
Say for example you want to see all those pictures containing a dog, or trees, or something as abstract and intangible for computers as handshakes or hugs. Tech giants like Facebook and Google have prototypes in the works which are smart enough to generate sentence-long descriptions for the photos in less than a few seconds.
The magic lies in the mechanism of how computers are able to recognize these images. Today the Google app for instance can recognize faces, animals, surroundings and much more. Think of how many pictures the system would have digested and learned during the mobile application development phase.
In order to gather up dog pictures, the app should be capable enough to identify anything from a Chihuahua to a labrador. It should not go haywire if the dog in the image is upside down or partially obscured. Even if the subject is at the right of the frame or the left, in fog or snow, sun or shade. The learning mechanism should be able to match the results up to the T.
And what about the time when there are other animals like cats and wolves in the same frame? It needs to exclude them using pixels alone. How is that Possible?
We will see that later on.
How are tech giants changing the face of Deep Learning
In 2011, Google had launched the deep-learning-focused Google Brain project This project introduced neural nets into its speech-recognition products in mid-2012. They even retained neural nets pioneer Geoffrey Hinton for the project. There are thousands of deep-learning projects ongoing along with scholarly papers being published every now and then. The results and outcomes from their research extend to Android, Gmail, photo, maps, translate, YouTube, and self-driving cars. And as the news goes, after buying DeepMind, a deep reinforcement learning project, Google developed AlphaGo, a meticulous deep learning algorithm defeated the world’s Go champion, Lee Sedol. This was a landmark in the field of artificial intelligence.
Back in 2013, Yann LeCun, a French-American computer scientist who has worked primarily in the fields of machine learning, computer vision, mobile robotics, and computational neuroscience, was hired by Facebook to direct its new AI research lab. Today, the company puts to use its sophisticated neural nets to translate more than 2 billion user posts every single day and that too in more than 40 languages. These translations are seen by 800 million users a day. It also employs neural nets for photo search and photo organization.
BAIDU
Andrew Ng, who was responsible for leading the Google Brain project, was hired by Baidu in 2014 to lead its research lab. Baidu is China’s leading search and web services site and it uses neural nets for speech recognition, translation, photo search, and self-driving car projects. China’s main language is Mandarin, hence speech recognition is important there. Being a mobile-first society, it becomes difficult to type into a device for them. Baidu has seen that the number of customers interfacing by speech has tripled in the past 18 months.
Existing dynamics
All of these advances we discussed about, in image and speech recognition, go beyond the dynamics of social apps we find so cool. A few medical startups today even proclaim that in the near future, they will be able to use computers to read X-rays, MRIs, and CT scans quickly and even more accurately than current radiologists which will help diagnose cancer earlier and less invasively. This will also help them speed up the search for life-saving pharmaceuticals.
Improved image recognition is vital to bring out the improvements in robotics, autonomous drones, and self-driving cars. Companies like Ford, Tesla , Uber, Baidu, and Alphabet are all testing prototypes of self-piloting vehicles on public roads today as we read this today.
All of these breakthroughs have been made possible by a family of artificial intelligence (AI) techniques which are more popularly known as deep learning.
Endnotes
As astonishing as it may sound, us humans have not ‘programmed’ any neural nets of a computer to perform any of the aforementioned feats. And the matter of fact is that we cannot. Programmers and scientists have fed computers with a learning algorithm, then presented it with terabytes of data consisting of thousands and thousands of images worth of speech samples. This was done in order to train it, and then allowed the computer to figure out for itself how to recognize the desired objects, words, or sentences.
This way computers have become capable enough to teach themselves. It is like software writing its own software.