Plugin Tutorial #1: building a plugin

Code
Docs
Spec

Cumul.io is a great tool to build visualizations on top of your data, but how do we connect our raw data to the platform? While there are out-of-the-box connectors for you to use, you might:

  • want to connect your existing API, with or without authentication.
  • want to use a database that has no out-of-the-box connector yet.
  • want to provide a Cumul.io plugin to the users of your web service to give them access to easy insights via Cumul.io (such as Pipedrive/Stripe/Quickbooks/etc..) .

Plugins are an answer to all these desires. They are data connectors that work as a bridge between Cumul.io and external data sources, like Facebook, Stripe, Athena, etc. They are a quick and convenient way to connect any data to Cumul.io.

What are we building?

In this post, we’ll write a basic plugin step by step, and we will get familiar with the concept of plugins. Since we will be creating a plugin for an open data source, there is no need to add authorization to our plugin here. In a future post, you’ll learn how to build more complicated plugins with authorization (OAuth or key/token).

We’ll connect the open API from citybik.es as an example, which contains data on bike sharing across the world. It has data on the availability of bikes per bike sharing station at any point of time. This API is freely available (so you can easily follow the tutorial) at https://api.citybik.es/v2/

Once we’ve written the plugin and added it to Cumul.io, it will appear as a new data source in your Cumul.io account, from which users can use datasets.

Below is an overview of what a user (who received access to the plugin) will see:

All set? Let’s go!

What is a plugin?

Before we start building, it is important to realize that a plugin is in essence an API which fulfills a contract defined by the Cumul.io team. The contract defines that there are 4 endpoints in the API. Since we are not yet using authorization in this plugin, in this tutorial we will only use 2 of them.

Method
Endpoint
Description
POST
/authorize
Endpoint that is called when an account on your plugin is created.
POST
/exchange
Also related to authentication, only for oauth,
GET
/datasets
Get information which datasets the plugin provides (remember the dataset seleciton screen earlier in this post)
POST
/query
Retrieve the data!

 

Since we will be building an API, we need a webserver.

Setup

To get started, you only need 2 things:

  • An account on Cumul.io with Cruise features (the free trial account has full access). If you are already using Cumul.io, check your license in the profile section to verify you have at least Cruise to access plugin functionality.
  • An installation of nodejs (>8.x.x), follow the installation instructions for your system nodejs.

Verify in a terminal that node and npm (node’s package manager) are installed and that your node version is greater than 8.x.x.

Create a project folder (in our case it’s called citybikes) and initialize your node project. For programmers new to nodejs: Initialization means that we create a package.json which contains some basic configuration of our project. This will be used to store the package versions that are used in this project. We can also use npm to create a package.json interactively in the terminal using:

npm init

Aside of that, we will need an index.js (where we will program the endpoints) and a webserver.js (which will host our webserver).
So, the folder where we will be coding the plugin should look like this:

Alrighty then, let’s start programming the plugin.

Libraries

We will be using several libraries to build our plugin:

  • Request: an easy way in node to to make post/get calls. We’ll use this to access the citybik.es API.
  • Express: a web framework that we’ll use to build the webserver.
  • Body Parser:  middleware that is necessary for express to parse incoming requests.
  • Compression: middleware for express that decreases the size of the response body and hence increases the speed of your API.
  • Dotenv: a convenient way to load environment variables

Install them all in your terminal using npm:

npm install --save request express body-parser dotenv compression

(–save makes sure they appear in the package.json so the next person can install all dependencies with npm install)

Write the web server

Open the webserver.js, and write the import statements for the required libraries.

Immediately after that, we’ll write a function that will be exported. This function will contain the configuration of our webserver. It can then be imported, and thus be reused to build other plugins.

// > server.js
const bodyParser = require(‘body-parser’); 
const compression = require(‘compression’); 
const express = require(‘express’);
module.exports = () => {
 // our webserver will come here
} 

Note that we use a module where we will export our server, in order to reuse it in multiple plugins. That means you’ll only have to write this once!
Then, we’ll add the code to configure the server to that function.

// > server.js
const bodyParser = require(‘body-parser’); 
const compression = require(‘compression’); 
const express = require(‘express’);
require('dotenv');
module.exports = () => {
  let app = express();  
  app.set('json spaces', 2);  
  app.set('x-powered-by', false);  
  app.use(compression());  
  app.use( (req, res, next) => {    
    res.setHeader('Content-Type', 'application/json');    
    res.setHeader('Content-Language', 'en');    
    res.setHeader('Access-Control-Allow-Origin', '*');    
    res.setHeader('Access-Control-Allow-Methods', 'GET, POST, OPTIONS');    
    res.setHeader('Access-Control-Allow-Headers', 'Origin, X-Requested-With, Content-Type, Content-Language, Accept');    
    next();  
  });  
  app.use(bodyParser.json());
  app.options('*', (req, res) => {
    res.status(204);  
  });
  app.listen(3030, () => console.log(`[OK] Cumul.io plugin \'Citybik.es\' listening on port 3030`));
  return app;
} 

We will not go into detail on the webserver itself, since it is beyond the scope of this tutorial. The code configures our webserver to parse the bodies as json, use compression, sets a few headers and configures the server to run at port 3030. To understand the code, it is useful to think of app.use(...middlewaremethod...) as a way to chain middleware methods that will be called on each request. Now that we have written the webserver, we can reuse it in any other plugin we make.

Write the plugin

Our basic plugin only needs two endpoints (since it is an open plugin). The skeleton of our plugin looks as follows:

// > index.js
const webserver = require(‘./webserver’); // include the webserver we just made
const request = require(‘request’);       // request to write our endpoints
const app = webserver();                  // instantiate the webserver
require('dotenv') // expose the .env file as environment variables. 

app.get('/datasets', function(req, res) {
	// code that retrieves the metadata (which datasets and columns are available). 
})
app.post('/query', function(req, res) {
	// code that retrieves the actual data upon a query 
})

In this snippet, we instantiate the webserver and define that we will be writing a GET/datasets endpoint of type and a POST  /query endpoint of type.

The /datasets endpoint

The /datasets endpoint is responsible to give information about the structure of the data to Cumul.io. Let’s start simple by returning static data. This will give you a clear example of the required return structure.

// > index.js
...
app.get('/datasets', function(req, res) {
	// Temporary example of a simple metadata definition
	var datasetMetadata = [{
		id: "some id",
		name: {en: `burritos`},
		description: {en: `Beware, burritos below`},
		columns: [
          {id: 'name', name: {en: 'Name'}, type: 'hierarchy'},
		  {id: 'price', name: {en: 'Price'}, type: 'numeric'},
		  {id: 'vegetarian', name: {en: 'Price'}, type: 'numeric'},
		  {id: 'expirationday', name: {en: 'Expiration Day'}, type: 'datetime'}
		]
  }]
  return res.status(200).json(datasetMetadata);
})
...

This example defines a burrito plugin that will only provide one dataset called ‘Burritos’. The code defines the dataset contains three columns, each of every available type (hierarchy, numeric, datetime). Regarding these data types, Cumul.io keeps things simple and only supports 3 data types.:

  • Hierarchy – a string data type (booleans can be mapped to hierarchies)
  • Number – a numerical data type, including int or float
  • Datetime – a date type, it expects the date to be in ISO format

We only wrote about 10 lines of code on a temporary plugin, yet now it’s a good time to see how to test our plugin.

Testing your plugin.

Start your server: run the following command in your terminal to start the server node index.js

To test the results, we can check out the result in our browser by going to localhost:3030/datasets

Tip:  other calls will be using the post method, use postman or something similar to test these.

(optional) try adding the plugin: more interesting is to see what these few lines of code would look like if you add your plugin to Cumul.io. In order to do so, you will need to make sure it is accessible from outside of your pc, and that it runs on https. One way achieve this, is to use ngrok or something similar. Running this in a separate terminal will give you a https url that looks like: https://dc0c717f.ngrok.io. You can always put your plugin online directly, or wait until you put it online to test this step.

Once your plugin is reachable through an https address, you can put it online and use it. Note that we can already see the datasets we defined, when we try to use the plugin (by adding a dataset).

Note that when you create the plugin, an app secret will be generated. You can retrieve this through Profile > Plugins.

This is a secret to ensure that only Cumul.io can call the plugin. We will add it to our .env file that will be loaded by the dotenv library:

CUMULIO_SECRET=< your app secret >

Calling require('dotenv') in the script will export the contents of the .env file as environment variables, so we can use it on our code. You can of course export your environment variables manually as well if you prefer that.
Using this secret, we can ensure that only Cumul.io can call your plugin. Even for an open plugin, it can be useful to avoid that people use the plugin differently than intended. This could cause unnecessary traffic.

We can do this by adding the following lines of code to !each! of the endpoints we will write.

// > index.js
require('dotenv')
app.get('/datasets', function(req, res) {
  if (req.headers['x-secret'] !== process.env.CUMULIO_SECRET)
    return res.status(403).end('Given plugin secret does not match Cumul.io plugin secret.');
  // rest of your code
})
...

Until now, we used a static definition of burritos in /datasets as an example. Static definitions are perfectly fine (especially for databases or APIs without a defined schema). However, the citybik.es API obviously is not about Burritos. In the case of citybik.es, the names of the datasets can be derived from the API. The citybik.es API exposes several ‘networks’ which we will map to datasets. Each network (or dataset) contains the same information, so we can easily map this to 6 columns: station_name, last_update, latitude, longitude, free_bikes and empty_slots.

To write the actual datasets endpoint for citybik.es, we use the request library to call the citybik.es API, to retrieve the networks.

// > index.js
require('dotenv')
app.get('/datasets', function(req, res) {
   if (req.headers['x-secret'] !== process.env.CUMULIO_SECRET)
     return res.status(403).end('Given plugin secret does not match Cumul.io plugin secret.');
   request.get({
     uri: 'https://api.citybik.es/v2/networks',
     gzip: true,
     json: true
   }, function(error, networks) {
     if (error)
       return res.status(500).end('Internal Server Error');
     //  ... Do something with 'networks' (the result of the API call)...
   })
})
...

Then, we transform this result to a dataset with Cumul.io columns. Here, we use the ids and names of the original networks as the dataset id and name.

// > index.js
 app.get('/datasets', function(req, res) {
   if (req.headers['x-secret'] !== process.env.CUMULIO_SECRET)
     return res.status(403).end('Given plugin secret does not match Cumul.io plugin secret.');
 
   request.get({
     uri: 'https://api.citybik.es/v2/networks',
     gzip: true,
     json: true
   }, function(error, networks) {
     if (error)
       return res.status(500).end('Internal Server Error');
     var datasets = networks.body.networks.map(function(network) {
       return {
         id: network.id,
         name: {en: `${network.name} ${network.location.city}`},
         description: {en: `Real-time availability of bike sharing network ${network.name} in ${network.location.city}`},
         columns: [
           {id: 'station_name', name: {en: 'Station name'}, type: 'hierarchy'},
           {id: 'last_update', name: {en: 'Last update'}, type: 'datetime'},
           {id: 'latitude', name: {en: 'Latitude'}, type: 'numeric'},
           {id: 'longitude', name: {en: 'Longitude'}, type: 'numeric'},
           {id: 'free_bikes', name: {en: 'Free bikes'}, type: 'numeric'},
           {id: 'empty_slots', name: {en: 'Empty slots'}, type: 'numeric'}
         ]
       }
     });
     return res.status(200).json(datasets);
   });
 });
...

The result is that we now have a plugin that exposes the datasets of our citybik.es API. Users retrieve a full list of available sets when they access the plugin.

The /query endpoint

Of course, when your users add a dataset at this point, no data will come in. That is what we need the /query endpoint for. Whenever a dataset of the plugin is used in a chart, or is viewed using the data table, a call will be launched to the plugin’s /query endpoint.

Before we proceed, know that there are two types of plugins: pushdown enabled plugins and regular plugins. In this tutorial, we’re writing a regular plugin. This means that we will not handle aggregations (sum/avg/etc) in the plugin or in the underlying API/database. An example of a pushdown enabled plugin will be written in a next post. In the meantime you can get more information about them here.

In a regular plugin, the result of the /query call will return an array of arrays, which follows the exact structure that was defined by the /datasets endpoint. So in other words, if a query is done for a specific dataset, the return structure will have exactly that many values in a row as there are columns defined for that dataset. In the case of citybik.es, all datasets have the same structure which makes our implementation straightforward.

We will start off by checking whether the secret is provided in the query call.

// > index.js
 app.post('/query', function(req, res) {
   if (req.headers['x-secret'] !== process.env.CUMULIO_SECRET)
     return res.status(403).end('Given plugin secret does not match Cumul.io plugin secret.');
 ...
})

To be clear on the data structure, we will start with a static example again. Note that we return an array of rows, in which we follow the same order of columns as we defined in our /datasets endpoint. Also note that dates have to be sent in ISO format (or you can use javascript dates, which will be serialized to ISO formatted strings).

 // > index.js
 app.post('/query', function(req, res) {
   if (req.headers['x-secret'] !== process.env.CUMULIO_SECRET)
     return res.status(403).end('Given plugin secret does not match Cumul.io plugin secret.');
  return res.status(200).json([
    ["601 - Grand Ave & Main Hwy", "2018-06-06T14:33:34","25.73", "-80.24" ,"8", "8"],
    ["650 - NE 14th Terr & Biscayne Blvd", "2018-05-18T14:33:34", "25.79", "-80.19", "4", "12"],
    ...
  ]);
})

In the case of /query, static data is of course not very useful (you’d better upload a CSV then). So, let’s retrieve the data from the citybik.es API and transform it to the correct format.

 // > index.js
 app.post('/query',(req, res) => {
  if (req.headers['x-secret'] !== process.env.CUMULIO_SECRET)
    return res.status(403).end('Given plugin secret does not match Cumul.io plugin secret.');

  request.get({
    uri: `https://api.citybik.es/v2/networks/${req.body.id}`,
    gzip: true,
    json: true
  },(error, stations) => {
    if (error)
      return res.status(500).end('Internal Server Error');
    var stations = stations.body.network.stations.map(function(station) {
      return [station.name, station.timestamp, station.latitude, station.longitude, station.free_bikes, station.empty_slots];
    });
    return res.status(200).json(stations);
  });
});

The plugin is finished, and the complete code for the plugin is only 56 lines long!

'use strict';

var app = require('./webserver')();
var request = require('request');
require('dotenv')

// 1. List datasets
app.get('/datasets', function(req, res) {
  if (req.headers['x-secret'] !== process.env.CUMULIO_SECRET)
    return res.status(403).end('Given plugin secret does not match Cumul.io plugin secret.');

  request.get({
    uri: 'https://api.citybik.es/v2/networks',
    gzip: true,
    json: true
  }, function(error, networks) {
    if (error)
      return res.status(500).end('Internal Server Error');
    var datasets = networks.body.networks.map(function(network) {
      return {
        id: network.id,
        name: {en: `${network.name} ${network.location.city}`},
        description: {en: `Real-time availability of bike sharing network ${network.name} in ${network.location.city}`},
        columns: [
          {id: 'station_name', name: {en: 'Station name'}, type: 'hierarchy'},
          {id: 'last_update', name: {en: 'Last update'}, type: 'datetime'},
          {id: 'latitude', name: {en: 'Latitude'}, type: 'numeric'},
          {id: 'longitude', name: {en: 'Longitude'}, type: 'numeric'},
          {id: 'free_bikes', name: {en: 'Free bikes'}, type: 'numeric'},
          {id: 'empty_slots', name: {en: 'Empty slots'}, type: 'numeric'}
        ]
      }
    });
    return res.status(200).json(datasets);
  });
});

// 2. Retrieve data slices
app.post('/query', function(req, res) {
  if (req.headers['x-secret'] !== process.env.CUMULIO_SECRET)
    return res.status(403).end('Given plugin secret does not match Cumul.io plugin secret.');

  request.get({
    uri: `https://api.citybik.es/v2/networks/${req.body.id}`,
    gzip: true,
    json: true
  }, function(error, stations) {
    if (error)
      return res.status(500).end('Internal Server Error');
    var stations = stations.body.network.stations.map(function(station) {
      return [station.name, station.timestamp, station.latitude, station.longitude, station.free_bikes, station.empty_slots];
    });
    return res.status(200).json(stations);
  });
});

Testing the finished plugin & deploying it.

We can test the plugin in a similar way as before. However, we have a post call now, so let’s use postman or anything else that can send post calls.

First of all, make sure you set the header this time and set the Content-type, since we will provide a body.

In this body, we will send the id of the dataset that needs to be retrieved. We can see the result coming in below.

You can also test the local plugin using ngrok again, and see the results.

When we consider our plugin to be ready, we need to host it somewhere. This can be your own server or a cloud platform. Although we mainly use AWS, we will write an example for this plugin on Heroku, since it’s easy to start and has quite some functionality available for free.

Create a free account on Heroku and choose your app name and region.

Heroku uses a specific file, called a ‘Procfile’ (this is also the filename) where you define how Heroku should run your app. If you place a file with filename ‘Procfile’ in the root of your repository heroku will find it and know how to run your code. Ours looks like this:

// > Procfile
web: node index.js

A new Heroku app provides a remote git repository. Deploying means that you have to push your code to this repository. Heroku will detect changes to this git repository and prepare your deploy. To execute this command, we will have to install the Heroku CLI though.
After making your repository, Heroku will present you with the instructions to deploy your code. In our case, we didn’t create a repository yet, so our deploy commands initialize the git repository first.
Note: ‘git add .’ will push all files. It is generally not a good idea to push secrets such as the .env file. Use a .gitignore file to prevent that.

heroku login

git init
heroku git:remote -a mycitybikesplugin
git add .
git commit -am "make it better"
git push heroku master

Finally, do not forget when you add the plugin to the platform (a new url requires you to add a new plugin) that you will receive a secret, which you’ll need to add to the environment variables of your API. In the case of Heroku, you can add environment variables here:

You should now have your API running on Heroku and can add the plugin to our platform.

Final notes:

Since we have now built a regular, open plugin, this post might give rise to some questions.

  • How do we build plugins authentication?
  • Where do aggregations happen?
  • What if I do not want to show all data to a specific user?
  • How can we write a more efficient plugin?

This will become clearer in future posts in this plugin series. If you’d like to delve further into our plugins, stay tuned for the next tutorials! For now we can briefly explain the possibilities of our powerful plugin API.

Authorization is possible by implementing the authorize endpoint for authorization (key & token), and both the authorize and exchange token for OAuth based authorization. These endpoints receive information from the user, which allows you to verify in your own database or authorization systems whether these users can have access. Information about the user is also provided to the query endpoint, which allows you to filter the dataset that this specific user can see.

The calculations of aggregations (for example sums or averages for a bar chart) happen by Cumul.io in this example. However, the query endpoint receives a query-like json structure which contains the information of the specific query. In case you want to write a more efficient query, you can calculate these aggregations on your database. This is what we call a pushdown enabled plugin, which we will cover in a future post.

All of this and much more… is coming!

PS: feel free to comment and ask questions. Then we can anticipate, and our future posts might be more tailored to your question.

Add a Comment

Your email address will not be published. Required fields are marked *