Every application developer has faced the need for sample data. Whether it’s for functional testing or stress testing at scale, a large volume of meaningful data is required to see how an app actually performs under real world conditions. In most cases using actual production data is out of the question because of the impact on operational workloads, and on data governance and security, so sample data is required.

Random data generation tools exist, but they typically only generate simple repetitive data with generic values that bear no resemblance to complex application data. And these tools can rarely produce enough data to truly test apps for enterprise level scale and performance.

I was recently involved in a development effort where we found ourselves faced with this very situation: a prototype airline app that worked fine with a miniscule amount of hand created data, but that had yet to be tested at scale with large volumes of data. And to make matters challenging, the data had to be complex with nested elements, making data generation tools that could only produce small amounts of simple data useless.

For our solution, we turned to the amazing features and immense scale of Couchbase Capella, the cloud database as a service.

Read on for an overview, follow along with the demo video showing the solution in action, or jump straight to the code in GitHub.

 

The recipe we came up with is simple:

    • 1 part Couchbase Capella Data Service
    • 1 part Couchbase Capella Eventing Service
    • 2 Eventing functions

Mix well in a Capella configuration sized according to the amount of data you want to generate in a specific amount of time. Generate desired data and remove piping hot and ready for use.

Let’s dive into the details!

Couchbase Capella = speed and scale

Couchase Capella is the cloud database platform for modern applications that fuses the agility and performance of a distributed NoSQL database and the strengths of an RDBMS into a single database.

Capella is multi-purpose, which means it is a combination of data access patterns. First, it provides key value processing in memory for hyper-fast responsiveness. Next, it provides distributed storage of JSON document-based data for flexibility and resilience. To this, add support for full text search, mobile data sync, IoT/time series, columnar analysis, and more.

It also offers SQL query support – not typically found in other document databases – so developers can work with Capella using a language they already know. These are capabilities that customers no longer have to stitch together from a bunch of different technologies, because they get it all in a single database with Capella.

Capella’s scale and speed are trusted by large Enterprise customers and startups alike, and proven in benchmarks that show superior performance vs competitors.

Capella Eventing Service = real time action

In addition to the other built-in features, Capella offers the Eventing Service, which allows developers to write JavaScript functions that are called in real time when there are changes to data in Capella. Functions are easy to create using the Data Tools Eventing editor in the Capella Control Plane, where you specify the source buckets that trigger the functions to execute.

Typical use cases for the Eventing Service include:

    • Threshold-based alerts
    • Monitoring parameters
    • Enriching documents
    • Scheduling future actions
    • Data cleansing point tools
    • External REST interaction

100 million docs in minutes!

We went with the Capella Eventing Service for data generation because it gave us the flexibility to produce documents formatted exactly as we needed, and the scale to produce lots of them!

First, we configured our Capella cluster to generate a large volume of data in a short amount of time. In this case, we wanted 100 million documents in around five minutes, so we went with 10 cluster nodes in total: 6 running the Data Service, and 4 running the Eventing Service.

Next, we created functions for the Capella Eventing Service with loops designed to output JSON documents with realistic airline data into a target bucket.

From there it was easy to generate the data. We triggered the functions by making a simple data mutation in a specified source folder, and we got 100 million airline documents in under three minutes!

Once our data was generated, we took advantage of Capella’s Multi-Dimensional Scaling and decreased our configuration back down to 3 nodes, saving costs.

Author

Posted by Mark Gamble, Director of Product & Solutions Marketing

I am a passionate product marketer with a technical and solution consulting background and 20+ years of experience in Enterprise and Open Source technology. I have launched several database and analytic solutions throughout my career, and have worked with customers across a wide variety of industries including Financial Services, Automotive, Hospitality, High-Tech and Healthcare. I have particular expertise in analytics and AI, love all things data, and am an emphatic supporter of data-for-good initiatives.

Leave a reply