You can't build a modern search engine without using machine learning. Or at least you shouldn't. Over the past ten years, advances in machine learning — and in particular deep learning and transformer neural networks — have enabled us to solve problems in search that were previously unsolvable. New models like BERT allow us to understand search queries in a deeper, more nuanced way than older, keyword-based approaches.
As with most things in search, Google has been the pioneer in machine-learning approaches to search. Soon after their BERT breakthrough in 2018, they implemented BERT in their search algorithm, calling it "one of the biggest leaps forward in the history of search."
Yext Answers uses BERT, along with many other similar transformer models, in many of the same ways Google does. A few of Yext's use cases include…
Disambiguating named entities, like "Edward Norton" and "Edward, North Carolina"
Producing featured snippets with question-answering models
Encoding semantic vectors to perform semantic search
All of these are supervised machine learning algorithms, which means they need labeled training data (and lots of it) to work. For example, in order to build a question-answering model optimized for search, we need tens of thousands of question/answer pairs to train the neural network.
This is both the fundamental strength and weakness of supervised machine learning: the algorithm can implicitly learn from labeled examples, without the need to encode rules or heuristics. But the algorithm needs a huge number of these examples, which can be challenging to produce.
As Michael Misiewicz, Yext's Head of Data Science, described in his recent post, we have a sophisticated apparatus for data labeling at Yext, consisting of a multilingual labeling team, proprietary labeling software, and thorough quality control guidelines. But we can only do so much labeling ourselves.
We also wanted to give Yext administrators the ability to train the Answers algorithm themselves, by giving direct feedback on our algorithms' predictions. To that end, we developed our Experience Training framework.
Experience Training allows admins to give direct feedback on predictions made by the various supervised ML models in Yext Answers. For example, admins can approve or reject the Featured Snippets produced by our question-answering model.
When an admin provides feedback, two things happen:
The Answer algorithm immediately modifies its prediction via an override layer.
The admin's feedback enters into a training pipeline and becomes training data for future versions of the model.
Therefore, the admin's feedback takes effect immediately, while also being incorporated asynchronously into our continuous model retraining pipeline. Let's dive deeper into each of these aspects.