In a paper published Wednesday on OpenReview.net, Google AI and Toyota Technological Institute of Chicago have announced that their new AI, ALBERT, has taken the top spot in several natural language reading comprehension tests, taking the first place in SQuAD 2.0, GLUE and high RACE performance score.
On the General Language Understanding Evaluation (GLUE) benchmark, ALBERT achieves a score of 89.4, on the Stanford Question Answering Dataset benchmark (SQUAD), 92.2, and on ReAding Comprehension from English Examinations (RACE) benchmark, 89.4%.
For SQUAD 2.0 average human performance is 89.452.
SQuAD2.0 combines the 100,000 questions in SQuAD1.1 with over 50,000 new, unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. To do well on SQuAD2.0, systems must not only answer questions when possible, but also determine when no answer is supported by the paragraph and abstain from answering.
ALBERT “uses parameter reduction techniques to lower memory consumption and increase the training speed of BERT,”
“Our proposed methods lead to models that scale much better compared to the original BERT. We also use a self-supervised loss that focuses on modeling inter-sentence coherence, and show it consistently helps downstream tasks with multi-sentence inputs,” the paper reads.
Top AI companies have been vying for the top slot in a contest. In late July, Facebook AI Research introduced RoBERTa, a model that achieved state-of-the-art results, and in May, Microsoft AI researchers introduced Multi-Task Deep Neural Network (MT-DNN), a model that achieved top marks in 7 of 9 GLUE benchmarks.
The technology has obvious applications for reading the voluminous amount of text on the internet and providing coherent answers, an obvious benefit for search engines.