Crawler

To build a powerful Knowledge Graph, you need data from all over. One great source of content is your own website. The Crawler scrapes the underlying content on your website, which our Data Connectors can then turn into structured data for your Knowledge Graph in just a few clicks. With highly customizable configuration options, the crawler will pull the exact information you want from your website — even text from PDF files you store online — saving you time and expanding your data source options.

{$headline}
Scrape Data from Your Website

The Yext Crawler can scrape your HTML and/or PDF file content from a specified set of domains, pages, or sub-pages under a domain, giving you control over exactly which content is brought into your Knowledge Graph. If your website data changes consistently, configure crawls to run on schedule, or, if you just need a one time backfill, crawl once. Don't want to crawl a specific set of pages? Easily blacklist unwanted URLs for the crawler to skip.

Configure a Crawler Data Connector

Once your crawler scrapes your website, a Data Connector converts and structures the raw HTML into data for Knowledge Graph entities. Highly customizable configuration allows you to extract exactly the data you need. You can specify a target path based on CSS or XPath selectors or use built-in selectors to capture commonly extracted data types, like Page Title and Body Content. Connectors can extract Text, HTML, URLs, Images, and more. Learn more about Data Connectors and The Crawler working together here.

Transform Your Data

The data on your website might not be formatted exactly how you want it in your Knowledge Graph. Use transforms in Data Connectors to manipulate data scraped by the Crawler before it enters the Knowledge Graph. Data Connectors allow you to preview any changes to your data in real time to ensure maximum accuracy. With transforms, you can remove unwanted characters, fix capitalization, find and replace text, format dates, and more.

Want to become a Yext Expert?

Join Hitchhikers, the new Yext training platform and community, to test your knowledge, earn badges, and engage with the experts.
Create your free account
Environmental Image

One Platform. Unlimited Solutions.

A great search experience is key to a great brand experience — on and off your website. Answer customers' questions and convert more business across digital channels with AI\-powered marketing solutions from Yext.

Explore

Answer support questions before they become support tickets. Streamline the resolution process with Yext's modern, AI\-powered customer support solutions.

Explore

Create an intuitive ecommerce search and discovery experience so you can meet your customers with direct answers every step of the way. Streamline the digital customer journey and turn your website into a conversion engine with AI\-powered ecommerce solutions from Yext.

Create an intuitive ecommerce search and discovery experience so you can meet your customers with direct answers every step of the way.

Streamline the digital customer journey and turn your website into a conversion engine with AI-powered ecommerce solutions from Yext.

Employees need company information to do their jobs. Make it easy, fast, and fun to find with AI\-powered workplace search solutions from Yext.

Employees need company information to do their jobs. Make it easy, fast, and fun to find with AI-powered workplace search solutions from Yext.

Explore

Build on the Yext platform for a fully custom AI search experience — fast. With SDKs, APIs, and robust documentation, the Yext Answers platform provides the building blocks to create a bespoke search experience.

Build on the Yext platform for a fully custom AI search experience — fast. With SDKs, APIs, and robust documentation, the Yext Answers platform provides the building blocks to create a bespoke search experience.

Explore

Explore Related Features