One of Yext's core products — and a crucial component of our Answers Platform — is Search, an "AI-powered search experience based on natural language understanding and using a multi-algorithm approach". Machine learning algorithms that are at the core of AI modeling emulate decision-making based on available data that come from humans (esp. previous users of Search) — and they are often labeled by humans (annotators). Likewise, the models are created, trained, and deployed by humans. All humans have biases, stemming from their different cultural and social backgrounds, socio-economic situations, gender, age, education, world-view, personal history, and many other factors. As a result, there is always the potential for bias. At Yext, we are aware of this potential bias, and we strive to minimize or eliminate it by implementing the policies, guidelines, and control mechanisms outlined in this document.
General Principles for Ethical AI Design
At Yext, we seek to deploy innovative AI technologies that are not just economically profitable but also beneficial, fair, and autonomy-preserving for people and societies, drawing from the ethical principles of beneficence, non-maleficence, justice, and autonomy. These high-level principles are rooted in major schools of ethical philosophy, and they have been recently adopted into the domain of digital ethics from the domain of bioethics, where they have been applied for decades.* Concretely, this means that we aim to design AI that (1) avoids causing both foreseeable and unintentional harm; (2) helps promote the well-being of people; (3) is fair, unbiased, and treats people equally during the process of searching as well as when it comes to the search results it provides; and (4) is transparent and trustworthy.
Implementation of Ethical AI Design in Labeling & Model Training
Customer Selection
Yext is selective in the kinds of customers we work with, which dramatically reduces the risk surface for unethical inputs to AI. We do not publish content generated by end users, which may be found on social media websites (e.g., Facebook or Twitter), and which is often ethically complicated. While our AI models must be able to respond to end-user inputs, our data inputs for these models are derived from upstanding businesses that do not engage in ethically risky content production.
Training Data Characteristics
It is important to understand that not all areas of language are equally prone to bias, let alone ethical bias (i.e., bias related to factors like gender, ethnicity, or age). The vast majority of data that we label and use for machine learning (ML) at Yext represent concrete, verifiable, and specific information on businesses and institutions provided by those institutions themselves (online on their own webpage or in the form of digitalized internal documentation). Unlike the generalized user search across all resources available on the web, represented by tools like Google search or Bing, Yext's domain is the enterprise search, which means a search exclusively within a particular company/institution and its knowledge base. Given this unique character of Yext's business focus, ethically charged topics or concepts only rarely appear in the materials that are used for AI training. Consequently, it is hard to imagine a scenario where a bias could swing a particular annotation one way or another. There is always an external "source of objective truth" that the annotators need to refer to as instructed in the labeling guidelines. Should any uncertainties arise, the annotators have the option to escalate a particular labeling task to their manager who provides advice from both the linguistic and content perspectives and who involves other subject matter experts as needed. That being said, our data scientists always train ML algorithms on sufficiently large volumes of data that are representative of the scenarios in which the algorithm is to be deployed. By doing so, we prevent any idiosyncratic occurrences of inaccurate or biased labels from skewing the statistical pattern-matching that produces the AI algorithms.
Data Selection
The majority of labeling tasks begin with collecting datasets from search logs. When constructing a corpus of data for labeling, we make sure to avoid over-indexing on large clients by ensuring that no more than 40% of the data comes from one client, and at least four clients are represented in the dataset — unless there's a good reason to do otherwise (e.g., training a customer-specific model).
Labeling Process and Review Mechanisms
To guarantee the highest possible quality of labeled data, each labeling task must have clear written labeling guidelines that reflect the objectives of the project and explain in detail what labels should be used and how they should be applied. The guidelines are a result of collaboration between a linguistic expert/data-labeling manager, a data scientist, and a product manager. Each labeling project is first tested on a small amount of data in order to gather feedback for further clarification of the guidelines. After that, the labeling project is passed on to the annotators who are in constant communication with the labeling manager. The manager's task is to resolve any issues, ambiguities, or unclarities that the annotators bring up during the entire process of labeling and track the applied solutions in the labeling guidelines so that they can be consistently utilized in the future as well.
Marking problematic data as Corrupt
In order to maintain the four main ethical objectives stated above, the annotators are instructed to mark any queries and/or responses that contain vulgar, profane, or ethically questionable content as Corrupt. The data with this tag are discarded from any AI training. The same rule holds for queries with personally identifiable information (PII) or otherwise corrupted data (meaningless or irrelevant for the given business domain).
Consensus between multiple annotators
In order to prevent any unwanted bias, the majority of data used for model training or Search performance analysis are labeled by at least two annotators. If there is disagreement in the selected label between the annotators, the task is escalated for a "disagreement resolution", whereby an annotator assesses both labels and chooses the more appropriate one. If there are doubts about which label should be chosen, the task is further escalated to the labeling manager who discusses the optimal resolution with all annotators involved in the process. If the agreement cannot be reached (which only rarely happens), the data point is discarded.
Final Review
To add an extra layer of protection against bias and unwanted errors that could slip through and compromise the labeled data quality, the more experienced annotators perform a manual review of most labels assigned during the primary annotation process. The systematic implementation of the review process has been possible since March 2022 when Yext invested in the enterprise edition of Label Studio, a cutting-edge labeling software for large-scale labeling operations, where all our labeling tasks are currently carried out.
Preventing Bias and Ensuring Ethical Approach at the Product Level
Yext Search itself contains multiple features and measures for the safe and ethical usage of models trained through the above-described process. These safeguards stem from the philosophy that the use of AI models should be as transparent and configurable as possible, and direct supervision over model outputs should be given to the administrator whenever needed.
Search Algorithm Configurability and Transparency
AI models such as Embedding, Extractive Question Answering, and Named Entity Recognition are used in various places throughout the Yext Search algorithm to affect the recall and ranking of search results. For example,
- Embedding models may be used to rank results based on semantic similarity to a search query
- Extractive Question Answering models may be used to retrieve passages from long text documents that address a search intent
- Named Entity Recognition (NER) models may be used to detect locations in queries, in order to trigger logic to filter by proximity to that location While the results these AI models produce are themselves outside of the direct control of the user or administrator of a Yext Search experience, we strive to make their application in Search maximally transparent and configurable. Our customers can always choose which search algorithms to apply to their search experience. It is entirely possible, for instance, to create a search experience that does not make any usage of AI model outputs and retrieves results entirely on the basis of keyword- and phrase-matching. Additionally, the output of models and other factors influencing any given search can be viewed in a robust and detailed log record, which includes model outputs such as the featured snippet shown below, or the semantic similarity of a result that influenced its ranking.**