At its core, search is about understanding language.
While typing in disparate keywords was the norm 20 years ago, now most people are inclined — in fact, trained by search engines like Google — to ask more complex questions in a search box or to a voice assistant. Why? Because when we speak, we don't naturally string a bunch of random words together. And computers have caught on.
Natural Language Processing (NLP) — the combination of AI and linguistics that helps give computer programs the ability to read, understand and derive meaning from human languages — has been studied for more than 50 years, but it's modern advances in this area that have enabled search engines to become smarter over time. Because almost every other word in the English language has multiple meanings, search engines often struggle to interpret the intent expressed by users in both written and spoken queries.
BERT, which stands for Bidirectional Encoder Representations from Transformers, is a breakthrough open source machine learning framework for NLP introduced by Google in 2018 and later applied to its core search algorithm to better understand user queries. BERT allows the language model to determine word context based on surrounding words rather than just the word that immediately precedes or follows it.
Person, Place, or Thing?
Since launching Yext Answers, our site search product, we've constantly been improving its search algorithm to help businesses provide more relevant results to customers on their own websites. Recently we launched Milky Way, the latest upgrade to the Answers algorithm that leverages BERT technology to more accurately distinguish between locations and other entities. The primary reason is because location names (a place) are often the exact same as people (a person) or product names (a thing). For example, the following two queries both include the word "Orlando":
In one, the user is clearly referring to the city of Orlando (a place), while in the other, the user is referring to someone named Orlando (a person). Classifying the first Orlando as a place and the second one as a name is called Named Entity Recognition (NER) — a process to locate and classify named entities mentioned in unstructured text into predefined categories.
It's easy for you or I to distinguish the difference between the queries because we don't look at Orlando in isolation, but rather in the context of the words surrounding it. In the first example, any word that follows "Bank near" is most likely going to be the name of a place. In the second example, Orlando right next to Bloom immediately signifies the well-known actor. This is where BERT is invaluable, since it is designed to understand the contextual relationship between words in a text. Previously, Yext Answers could occasionally deliver a location-based result for "Orlando" in the "Orlando Bloom" query. With this new approach, Yext Answers now can handle the differences.
A BERT in the Hand
At Yext we're building the Official Answers Engine, and leveraging BERT in our Answers algorithm is an important next step to enabling businesses to deliver the most accurate — and official — answers possible. We know that one wrong answer can come with a huge opportunity cost, either in the form of lost business or a pricey call to customer service. By better pinpointing customer intent, more businesses can reduce that risk and ensure an exceptional customer experience on their own domain.