Making our machines understand the language has made significant changes in the field of machine learning and has improvised the various Natural Language Processing models. But on the contrary, it was quite difficult for machines to understand the underlying meaning of a sentence and how it has its importance in a bunch of sentences, until Google published BERT.
Let’s consider the following statements:
Sushant was my friend. He was a good coder but lacked the idea of optimized code. When in need, he has always helped me.
Humanly, this sentence has a clear meaning but quite difficult to understand for a computer. Natural Language Processing (NLP) has been a major player for training machines to understand and evaluate the meaning. But every Natural language processing (NLP) module, at some point lacked the ability to completely comprehend the underlying meaning of the sentences.
In the above sample statement, every highlighted word points towards the person “Sushant”, for a model trained to find and evaluate the specific keywords in a sentence would fail to connect the dots here.
Models are particularly trained to understand and evaluate the meaning of the words in one-after-one manner, which made the above mentioned sample quite out of scope. Now the need was, of something, that did not just understand the later part of the word but also the prior. Not just to connect the meaning with next word but to compare the meaning with last word too.
Transformer by Google:
The Transformer by Google, based on Novel Neural Network Architecture follows a self-attention mechanism and did surpassed recurrent and convolutional models for English language. Along with translating English to German and English to French, Transformer requires competitively less computation.
Transformer performs small tasks over a sequence and applies self-attention method, which establishes a relationship between differently positioned words in a sentence. In the sample statement about Sushant, it is important to understand the normal word ‘he’ refers to Sushant himself and this establishes the ‘he-him-his’ relationship to the mentioned person in a single step.
And then Google Introduces BERT:
Until BERT by Google came in to picture, understanding the conversational queries was quite difficult. BERT stands for Bidirectional Encoder Representations and is a big leap in the field of Language Understanding. The word Bidirectional itself means functioning in two directions. It was amazing to see BERT exceed all previous models and become the unsupervised pre-training natural language processing.
In practice, BERT was fed with word sequences with 15% of words masked, kept hidden. The aim was to train the model to predict, value of the masked words based on the words provided in the sequence, unmasked words. This method, known as Masked Language Modelling performs to anticipate the masked, hidden words out of sentence, based on context.
One of the finest application of such improvised models are seen with search engines, to find particular meaning of the sentence and to provide matching results, greatly helps in filtering the required information. There was time when Google used to rely on keywords, specifically added in blog post or website content, but with BERT, Google steps ahead and will now interpret words, NOT JUST KEYWORDS. Google search has been implementing BERT, as improvised software for better user experience. But with advanced software we need to implement hardware with similar capacities and this is where latest Cloud TPU, Tensor Processing Unit by Google, comes in picture. While enhancing the user experience, Google’s BERT will affect your SEO content too.
Currently, these changes are being made with English Language Search for Google U.S. But with aim to provide better result over the globe, Google will be implementing teachings of One Language to others, from English to rest.
Consider the following sentences:
- That flower is a rose.
- That noise made him rose from his seat.
If the machine is trained to understand and interpret the meaning of the sentence with one-by-one method, the word “rose” would be a point of conflict. On the contrary, with latest developments and thanks to google for open sourcing the BERT, the meaning of the word rose will now vary according to the context. The aim is not to interpret, how the flower is ‘rising’ or how the noise is making him into a ‘rose’, a flower.
XLNET and ERNIE:
Similar to Generative Pre-trained Transformer aka GPT and GPT-2, XLNET is BERT like Autoregressive language model, which predicts to next word based on context word’s backward and forward intent. Outperforming BERT and XLNET, Baidu has open sourced ERNIE
Another Pre-Training Optimized Method for NLP by Facebook:
Improvising what Google’s BERT offered, Facebook advanced with RoBERTa. Using Bert’s Language Masking Strategy, Facebook’s RoBERTa offered an improvised understanding for systems to anticipate the portion of text which was deliberately kept under surface. Implemented using PyTorch, FB’s RoBERTa focuses on improving few key hyper parameters in BERT. Various Public News articles along with unannotated Natural language processing data sets were used in training RoBERTa.
Delivering the state-of-the-art performance in various tasks and a score of 88.5 at GLUE Leader board, makes the RoBERTa a true benchmark.
And then Microsoft Jumped in:
Moving ahead, Microsoft’s MT-DNN, which stands for Multi-Task Deep Neural Network, transcends the BERT by Google. Microsoft’s NLP model is built on 2015’s proposed model but implements BERT’s Network architecture. Implementing Multitask Learning (MTL) along with Language Model Pretrainig of BERT, Microsoft has exceeded previous records.
Achieving new state-of-the-art results with multiple ‘Natural Language Understanding (NLU) Tasks’ and eight out of nine ‘The General Language Understanding Evaluation (GLUE) Task’, Microsoft’s MT-DNN amazingly surpassed and elevated the pervious benchmark.
With these rapid changes, number of entry level barriers will disappear and another level of betterment will be added to the models.
To wrap it up, advanced Language Understanding models are being focused on understanding the context along with the word. To understand the intended meaning of the sentence rather than relying on the words.
Google’s BERT has been improvising search results and has a lot to offer in future. Similar to BERT, RoBERTa and MT-DNN will significantly improvise the future state-of-the-art NLP models and we will witness various improvements in self-training models and much more.
Originally posted here.