If you have been following the earlier posts on the Couchbase blog, you have seen content about Couchbase Shell latest release. I am trying different things and today I wanted to see what I could do with time series datasets.

Ingest time series data

Browsing Kaggle, I found temperature data per City. I have downloaded it. It’s about 500Mb and that makes it hard to manipulate as a whole. But, of course, it’s a text file so we can easily peek at the data structure like so:

8599212 lines, this is going to be 8599211 documents. My end goal is to see a temperature graph of several cities over the years. To do that, I am going to import everything to an import bucket first, then I will transform the data into time series.

If I import this naïvely, it gives me one doc per city/month group. So the key of my doc will be like Århus:1743-11-01. Say I only want to upload the first line, it should look like this:

And now that I know it works, let’s get everything, and batch it:

I have created a series collection in which I import the result of a SELECT aggregation, getting all the dates as timestamps and all the AverageTemperature as value:

And now all the data is available as time series. Say I want the data from Paris, I can use the _timeseries function like this:

To speed things up you can create the following index: CREATE INDEX ix1 ON series(City, ts_end, ts_start);

Notice the use of the ts_ranges variable at the beginning. You can easily reuse those values in a template String. They start with a $ sign and variables must be in parenthesis like ($my_variable). Which also means that now you need to escape parenthesis character as well as double quotes.

Plot time series

There are a variety of terminal plotting libraries, here I am using youplot:

Now that is all great but ideally I want to have several cities in there to compare them. Any additional CSV columns are picked up automatically as long as you run youplot lines. Doing things gradually, let’s start with the support of multiple cities in the query. So a couple changes here:

    • grouping the data by time
    • adding d.city IN ($city) to the where clause. This works because the city array is a String literal of a JSON array. Let’s take a look at the answer as a JSON doc:

But this cannot turn into a CSV, even if you flatten everything like this:

So we can transform it into an object like so: Object v.city : v.temp FOR v IN ARRAY_AGG({"city": d.city, "temp":t._v0}) when v IS NOT MISSING END

Which can now be flatten into a CSV:

And with that we are ready to plot several lines:

To make that easier, you can use functions. Create a .nu file, like temp.nu, with the following content:

Then if you source it it becomes much easier to use:


I hope this gave you a quick overview of Couchbase time series support, of Couchbase Shell data manipulation, and how you can use other shell commands like youplot to make things more integrated and interesting. You could easily generate a full report in various formats from a GitHub Actions for instance, endless possibilities!

 

Author

Posted by Laurent Doguin, Developer Advocate, Couchbase

Laurent is a Paris based Developer Advocate where he focuses on helping Java developers and the French community. He writes code in Java and blog posts in Markdown. Prior to joining Couchbase he was Nuxeo’s community liaison where he devoted his time and expertise to helping the entire Nuxeo Community become more active and efficient.

Leave a reply