Creating Synthetic APIs

Sometimes the data you need for your PWA isn't available via a proper JSON API. React Storefront allows you to create a "synthetic api" by consuming the raw HTML output of an existing web site and turning it into a JSON feed.

Fetching Content from an Existing Site

To begin creating a synthetic API, start by creating a server-side handler and using the fetch module to pull content from an existing website. Here's an example:

import fetch from 'fetch'

const upstreamBase = 'https://example.com'

export default async function productHandler(params, request, response) {
  // Here we assume that the URL path on the upstream site matches that of the PWA.  Your app may vary.
  const url = upstreamBase + request.path

  // Download the HTML as a string from the upstream site.
  // Note that you may need to supply certain headers cookies (a session cookie is common) to make this work, depending on the site.
  const html = await fetch(url).then(response => response.text())

  return extractJson(html) // we'll get to this next...
}

Extracting a JSON Response from HTML

React Storefront bundles the Cheerio HTML parser into it's servless runtime. This library makes it easy to construct a JSON response by searching an HTML document for important data using DOM queries. The API is very similar to jQuery.

To parse an HTML string into a searchable Cheerio context use the fns.init$ function available in global scope:

const dom = fns.init$(html)

Imagine we needed to extract product info from the following HTML document:

<html>
<body>
  <div class="product">
    <h1>Little Black Dress</h1>
    <p class="product-info">
      Simple, classic.  The little black dress is guaranteed to impress.
    </p>
    <div id="price">$29.99</div>
  </div>
</body>
</html>

Then, extract the relevant bits of information from the DOM:

/**
 * Creates data for a ProductModel from legacy HTML
 * @param {String} html
 * @return {Object}
 */
function extractJson(html) {
  const dom = fns.init$(html)
  
  return {
    name: dom.$('h1').text(),
    description: dom.$('.product-info').text(),
    price: dom.$('#price').text()
  }
}

User Sessions

Receiving set-cookie Headers

Most web apps implement user sessions using cookies. When using fetch to get HTML from the source site, if a set-cookie header is returned, fetch will automatically capture that header as env.SET_COOKIE. You can then decide whether or not to forward that header on to the user's browser in moov_response_header_transform.js

Impact on Caching

Any response that contains a set-cookie header cannot be cached. Therefore, we recommend only forwarding set cookie for requests that contain session-specific information, such as cart contents or username lookup.

Here's an example implementation:

// moov_response_header_transform.js

module.exports = function() {
  // ...

  // This allows us to forward set-cookie headers received in MUR requests back to the client
  // This is request in order to transfer the cart over to checkout
  if (env.SET_COOKIE) {
    if (env.path.split(/\?/)[0].endsWith('cart.json')) {
      headers.addHeader("set-cookie", env.SET_COOKIE);
    }
  }

  // ...
}

In order to preserve the user's session fetching data from the source site, add the session cookie to the headers sent with fetch:

// product-handler.js

export default async function productHandler(params, state, request) {
  const sessionId = request.cookie['JSESSIONID']
  const url = `http://example.com${request.path}`

  const html = await fetch(url, {
    headers: {
      Cookie: `JSESSIONID=${sessionId}`
    }
  }).then(res => res.text())

  // ... parse HTML and create JSON response
}