Reliable performance at scale is paramount for every data-centric mission critical system. As technology professionals we put a lot of work into providing data platforms that will help ensure successful system deployments. Architecture discussions, concept presentations and product training is a great place to start in order to facilitate the vision and help define needs. However, we need (and want) to get our hands dirty to better understand and architect a reliable data platform. Conducting a formal Proof-of-Concept (POC) in an environment similar to which the system will run with similar data sizes and user workloads is often times the solution.
A POC style engagement model should help:
- Stakeholders understand what the implementation will feel like for the organization.
- Define benefits the new system will provide.
- Give insight to the business impact.
- Define the developer experience.
- Give operations teams an understanding of supporting the system.
In many cases, in order to meet all these objectives for a Couchbase POC we need data business owners understand and can be a big hurdle to jump in getting a proof-of-concept off the ground. Couchbase provides sample datasets but we often are faced with how to create datasets that make more sense to our customers.
Data Creation Option 1
If we have existing systems we can take the time to export data and create JSON structures that are sourced from existing production systems in place. Here I was able to take a sample dataset stocks.json and import into Couchbase. Each line of the file contains a JSON structure representing stock information.
- The code leverages line-reader => https://github.com/nickewing/line-reader
- Data File of Stock Information => http://jsonstudio.com/wp-content/uploads/2014/02/stocks.zip
- Sample Import => https://github.com/justinmichaels006/datasample/blob/master/appOG2.js
Data Creation Option 2
As often is the case we don’t have time or maybe don’t have access or need to garner involvement from other teams to use existing data. There are online tools available for generating sample data. Although often times they do not provide the ability to create datasets that are production size a very robust utility is available here … http://www.generatedata.com. Alternatively, we could create the structure we’re after and iterate through to create the size we need. Here we define a data structure that mirrors the data we would eventually have in production.
- Sample Import => https://github.com/justinmichaels006/datasample/blob/master/appOG.js
We need to be careful how large a dataset is created at a given time because Node.js spawns everything in the loop concurrently. Nevertheless I am able to generate a useful dataset for supporting a successful POC. Whatever the approach, using the options above we are able to provide a system that houses familiar data and a size that mirrors what we expect in production.
Next time, we’ll use these datasets to understand what we can do with Couchbase leveraging views and how our up coming query language nickel (N1QL) will dramatically simplify our development efforts.