When we first started building Answers almost two years ago, Yext was full of structured data. As a location listings company, we stored huge amounts of information about restaurants, hotels, doctors, hospitals, financial advisors, and more. This type of data is highly structured in nature. Each entity's data is organized into a series of fields and values with a uniform schema. The data needs to be this way so that it can be sent to hundreds of publishers across the web in a consistent way.
As a result, the first type of data we learned how to search in Answers was structured data. To that end, we developed a family of algorithms like Named Entity Recognition, NLP Filtering, and FIeld Value Direct Answers that are designed to answer questions about structured data. This allowed us to provide an unparalleled search experience for things like people, places, and products.
As Answers grew, we quickly began to learn that not all data is neatly structured like this. Pretty soon, frequently asked questions became the most searched entity type in Answers, surpassing locations, healthcare professionals, and other more structured entities. Every business in the world needs to provide answers to their customers' frequently asked questions, no matter the industry.
FAQs aren't like structured data. They contain a lot of semantic information. The order of words matters a lot, and there are a million ways to ask the same question. And so we created a new algorithm, Semantic Text Search, that was designed to search this type of semi-structured data by measuring the similarity in meaning between two strings of text, instead of just looking at individual keywords. This new algorithm helped Answers understand users' questions better than ever before.
But what about unstructured data? A lot of data doesn't lend itself to structure at all. You can't easily take a long help article, or blog post, or Wikipedia page and turn it into structured (or even semi-structured) data. Most companies have a huge amount of this type of data, and almost all of them struggle to search it effectively. This frustrates both customers and employees, who have to waste valuable time digging through content to find the answers they're looking for.
Our customers needed a better way to efficiently search this type of data, so we built one.