The UB Crawler

Our crawler returns live product and stock information for all items across our supported shops.

We love good product data

Our crawler is designed to provide accurate product data and stock availability. It parses the information directly from the retailer's product page, providing near real-time item and stock data (both on a product and variant level). Often the data returned by the crawler will be much more up-to-date than the feeds provided by your network partners.

You are welcome to use the crawler ad-hoc to check an item is in stock or to make sure the data for it is fresh. However, we use the crawler ourselves, so you may not need to.

We crawl in the following scenarios:

  • When an item is added to basket we crawl it if it hasn't been crawled within 10 minutes.
  • When a user loads their basket we crawl the items in it if they haven't been crawled within 10 minutes.
  • We crawl before we place an order.
  • We crawl again after we place an order.

IMPORTANT: Pre-caching your listed products

🚧

Pre-caching all your products

We strongly recommend that you crawl all products you're listing so we pre-cache the data on our servers and can load the basket very quickly when your users add items to their basket.

If you don't do this the product will be crawled in real-time as it is added to basket. The nature of complex product pages means this can take some time and may compromise your basket user experience and impact conversion.

If you're just setting-up, you can request all your products be crawled using the instructions below. You should also pre-crawl all new products you list after your initial crawl. You could establish a cron job for this or fire-off a crawl request as soon as you list a new product. You may also wish to re-crawl products periodically too, so we maintain a fresh record of stock.

Using the returned crawl data

You don't need to consume the resulting data from these pre-crawls (we cache it our end) but you may do so if you want it by specifying the callback URL. See below.

Crawl data

The crawler will return the following data per product URL:

  • Price
  • Currency
  • Title
  • Images (Guaranteed one image, with scope for supporting multiple images in development.)
  • Attributes (with availability) e.g. size, colour.
  • Item is out of stock.
  • Shop Information, including available shipping options and prices.

We may also return product description, however this is a currently unsupported feature. You're welcome to use this if your crawl result includes it.

If a product is added to basket that hasn't been crawled within the last 10 mins, we consider it stale and immediately re-crawl in the background. Any changes in the resulting data are then pushed live to the basket. The same applies for products that were already in the basket from before. If they are more than 10 mins old, we re-crawl them in the background when the basket is loaded.

Determining whether a shop is supported

We publish lists of supported shops. Please avoid sending crawl requests for products from shops we don't currently support. If we don't currently support a shop you'd like to use, you can request it of course!

Before crawling you can determine whether a particular shop is supported with the scriptexists endpoint.

Example:

curl -XPOST https://api.ub.io/shop/scriptexists?url=http://www.asos.com/

Returns: "status":"ok" if supported and "status":"error" if not. Full product URLs can be sent with the request.

If you are pushing a large data set to our crawler, please use this first to prevent products from unsupported shops being included in your crawl requests.

Crawling

You can crawl multiple products at once, or just include a single product.

If you require the results, they are posted to a callback URL. You can specify this in the request or, if you ask us to do so, we can add a default to your setup. (The URL specified in the request will override the default if you still include one.)

Note: Results are returned asynchronously, and often in near real-time, though we employ a queue for all requests to balance load. This means some crawls may be queued for a number of minutes if server load is high.

POST https://api.ub.io/products/bulk-crawl

Parameter

Value

apiKey string

API key

urls
array

Array of product URLs (must be encoded).

country
string

ISO_3166-1 alpha-2 (2 letter) country code, e.g. gb.

You must include the correct country code to ensure we crawl and return information from the correct country.

callbackUrl
string

Callback URL to which the UB API will POST each of the results of the bulk-crawl request. These will come in one at a time as soon as each crawl is complete.

CURL Example##

curl -X POST https://api.ub.io/products/bulk-crawl -d "apiKey=key&urls[]=PRODUCT_URL&urls[]=PRODUCT_URL&callbackUrl=http://www.example.com/callback"

Instant add-to-basket with your own product data

If you haven't pre-crawled your products and you have SKU-level product data in your own database, it is possible to use this as a fallback when crawling products before adding to basket. This results in a near-instant add to basket response from UB which will improve your customer experience.

In order to use instant add-to-basket, you specify the fallback product data in the crawl request before adding the product to basket. It must be specified as JSON and match the UB product format.

POST https://api.ub.io/products/crawl

Parameter

Value

url string

URL of product to crawl

wait
boolean

true

product
JSON

Product JSON matching the UB product format

product should include the following fields:

title string

Title of the product to show in the basket

images
array

URL of the at least one image for the product to show in the basket. Must specify at least 1 image URL.

price
JSON

Price of the product. E.g.
{ "value": 9.99, "currency": "GBP" }

  • _variants__ (optional)
    JSON

If there are product variants, these must be specified as a JSON tree matching the format shown in the example above.

If product is specified in the crawl request, the UB API will immediately return with product data. If UB already has cached crawl data in the database, it will return this. If not, it will return your specified product data. You can then add the product to basket in the usual manner via it's ID.

In both cases, UB will also trigger another crawl of the product in the background, which will automatically update the basket when it is complete. It is recommended to reload the basket after 10-20 seconds to reflect any changes - e.g. updated availability.