Superfast authoring of open data websites using Observable and Vega-lite

5 min readAug 5, 2018

Until recently, building websites with interactive data content was time consuming and required substantial technical expertise. Authoring professional-looking web content was out of reach for many analysts¹.

These hurdles stifled demand for standards compliant open data, because few users could take full advantage of its benefits. As a result, a lot of open data is still locked away in formats like Excel and pdf, which don’t integrate well with modern data tools.

In the past few years, new tools have dramatically simplified the process of creating web content from open data. In just a few lines of code, analysts can import the latest open data, build interactive charts, and embed their work in a website, all using free and open source tools.

The increase in the potential user base means the value of modernising open data offerings is much greater. Publishers can expect to see their users creating more compelling content that stays up-to-date automatically because it is derived directly from source data.

This post demonstrates how easy it is to get started. In what follows, we’re going to build a minimal example that looks like this:

Our free and open source tool kit

Observable notebooks provide an authoring environment that allow you to get started without any boilerplate code. Content can be written by analysts and then embedded into any other website. Observable takes care of asynchronous data loads and dependencies.
Markdown template literals, which enable computations to be embedded in statistical commentary with a simple syntax
Vega-lite provides a succinct way of authoring highly-interactive charts, with a intuitive visualisation grammar
Alasql enables you to run SQL queries against data in your browser

Step by step tutorial

In this tutorial, we’re going to load some open data into our browser, perform some basic analysis in SQL, create an interactive chart, and then embed our analysis into a website. This provides a minimal example to get you started on building something much more interesting. You can find the completed tutorial notebook here. For more complex and interesting examples, see here, here or here. Let’s get going…

Head over to observablehq.com and start a new notebook. Each of the code blocks in this tutorial should be in a separate cell in your notebook. This will enable Observable to handle asynchronous loads for us.

In the first cell, we’ll use markdown to create a heading and some text. The code is as follows:

md`# My test site
This site contains simple analysis of life expectancy from [our world in data](https://ourworldindata.org/life-expectancy).`

The observable standard library will convert this into formatted HTML for us. You should now see:

Load in d3, a javascript library that will allow us to fetch and parse data various formats like json or csv:

d3 = require('d3')

Load our open data on life expectancy, taken from here:

data = d3.csv("https://gist.githubusercontent.com/RobinL/39d07d801a6f00d857cfafdcdc229898/raw/4f57875bc19b64763efaaf50044210d7c93a5c3d/life-expectancy.csv")

Load in alasql, a javascript library that will allow us to run SQL against the dataset:

alasql = require("alasql")

Create a table in our in-browser database, and populate it with the csv dataset:

db = {
  let db = new alasql.Database();
    db.exec('CREATE TABLE df');
    db.tables.df.data = data;
 return db
}

Run a test SQL query against the data to see if it works:

{
    let sql =`select * from df limit 5`
    return db.exec(sql)
}

Create an HMTL select box that allows the user to pick what year of data they’re interested in:

viewof selectbox = html`
<select>${
  db.exec('select distinct year from df order by year desc')
    .map(d => html`<option value="${d.year}">${d.year}</option>`)
}</select>`

Use the value selected in the select box to filter the dataset to only the year selected, and (for brevity) filter to only country codes starting with A:

chart_data = {
    let sql =`select * from df 
              where year = '${selectbox}' 
              and code like 'A%'`    return db.exec(sql)
}

Import a library that allows us to pretty-print raw data into a HTML table:

import {htmlTable} from "@rohscx/inputs"

Use this library to display the data:

htmlTable(chart_data)

which results in:

Import vega_embed, which will allow us to plot vega-lite charts:

vega_embed = require("vega-embed@3")

Define and plot our chart:

vega_embed(
    {
    'data': {
        'values': chart_data
    },
    'encoding': {
        'x': {
          'field': 'life_expectancy',
          'type': 'quantitative'
        },
        'y': {
          'field': 'entity',
          'type': 'nominal'
        }
      },
    'mark': {'type':'bar'},
    'title': `Life expectancy in ${selectbox}`,
    'width': 600
    })

Write some commentary that will update dynamically as data and selections change:

md`Life expectancy in Afghanistan in ${selectbox} was ${chart_data[0].life_expectancy}`

You should now have a minimal but fully-functional example of an open data site. You notebook contains your user-facing outputs, and also all the code required to create them. Within the Observable notebook, you have limited options for adding custom styling to the page.

Finishing touches: Embedding your outputs in a website

In many applications, you probably want to embed these outputs within an existing website. With the data analysis done, you could leave the styling and layout of this website to experts.

This is straightforward, and Observable provide a guide here. You can find a simple html page that embeds only the user-facing content of the tutorial notebook here. This is ready to be hosted anywhere on the internet, and you can access a live version here.

A plea to open data providers

If you want your users to get maximum value from your data, please provide data that:

Is published in an open format like csv, in a predictable format.
Is formatted as tidy data
Has CORS enabled so users don’t have to copy the data before using it
Enables users to access the latest data, and frozen data for a given time period, at predictable URLs.

This will make is easy for your users to hook directly into your data, rather than making copies — making it much more likely that your latest data will be consistently presented across the web.

For an example, refer to ONS, who do an excellent job of this through their data API.

Endnotes:

¹ R Markdown provides an alternative way of authoring html documents without front-end expertise. It is a great tool for many purposes. However, if you want to create a static website with a custom layout that stays evergreen as open data updates, it is probably less suitable than the tools presented in this tutorial.