What is database scalability?
This page will cover the following to help you better understand database scalability:
- Horizontal versus vertical scaling
- Database scalability challenges
- How to improve database scalability
- Scalability of NoSQL versus relational databases
- Conclusion
What is database scalability? Database scalability is not just the ability of a database to handle more load but also to improve performance as the business demands on an application increase. Note that scaling doesn’t just mean scaling resources up to meet greater demand but also down if demand decreases.
Failure of a database to scale has three typical outcomes: CPU/memory overload, storage reaching capacity, and network overload that downgrades data traffic. Any one of these issues, or a combination of them, can bring down your application and seriously impact your business.
This page covers two types of scaling, the challenges of each one, and recommended solutions to overcome those challenges. Finally, it compares NoSQL and relational databases in the context of scalability and shows why Couchbase is the best choice for scalability.
Horizontal versus vertical scaling
There are two ways a database can improve its availability and behavior when more resources are demanded: vertical scaling and horizontal scaling.
What is horizontal scaling?
Horizontal scaling, better enabled in non-relational systems, refers to adding more nodes to share an increased load. These nodes are part of a cluster that can be spread across multiple servers, and the data can be connected via joins. Horizontal scaling is also known as scaling out.
What is vertical scaling?
Vertical scaling refers to adding more physical or virtual resources to a database that’s running on a single server. This can be accomplished by adding more CPU power, memory, or storage capacity. Vertical scaling is also known as scaling up.
Which is better, horizontal or vertical scaling?
The type of scaling you should choose depends on your application and the particular challenges you need to overcome. Factors to consider:
- Vertical scaling is a good first option when you don’t need a massive jump in scale, and you want to minimize changes to the overall system beyond the changes to your compute resources
- Scaling vertically may require downtime if you’re switching machines to gain more resources
- Ultimately, as your compute resources expand, it may be more expensive to grow and maintain your database using vertical scaling
When you run into these issues, or if you want to future-proof your system for different growth scenarios, horizontal scaling is the way to go.
- Horizontal scaling can improve fault tolerance and availability because it reduces the impact of a single server failure
- Horizontal scaling may require architecture and code changes to your application, however, the impact is mitigated by modern databases like Couchbase that provide autoscaling capabilities
Database scalability challenges
Scaling a database can be complicated, and the challenges you encounter will depend on a number of factors. The first challenge may be that you have a legacy application that runs on a relational database. In this case, you have to choose between throwing more physical/virtual resources at it or redesigning your application to run on a database that supports horizontal scaling.
Another challenge of scaling modern applications is managing costs depending on different loads. You don’t want to pay the same price for compute resources during times of low usage and high usage. You want your costs to match your demand.
A third challenge is that horizontally scaled databases can be more complicated to maintain and manage. Couchbase Capella™ is an ideal solution in this case because it’s a fully managed Database-as-a-Service (DBaaS) that supports replication and sharding as well as multi-dimensional scaling.
How to improve database scalability
You can improve horizontal scaling by supporting both replication and sharding.
Replication
Replication is a form of scaling that creates copies of a database or database nodes. If one node goes down, a copy of its data can be retrieved from a different node. Another advantage of replication is that requests can come into different nodes in different locations, thereby decreasing the load burden on any particular node.
A few key components of Couchbase are based on a master-master replication topology in which multiple Couchbase instances can act as master nodes and replicate data to one another:
- Couchbase uses replication streams to replicate data between nodes. A replication stream is a continuous bidirectional stream of data between two nodes.
- Couchbase stores data in buckets, which are logical containers that group related data together. Each bucket can be configured to replicate its data to one or more other nodes.
- Cross data center replication (XDCR) is a feature in Couchbase that allows for replication between data centers. XDCR enables replication between Couchbase clusters, which can be located in different regions or availability zones.
- In a master-master replication topology, conflicts can occur when multiple nodes update the same piece of data simultaneously. Couchbase has a conflict resolution mechanism that relies on document versioning and timestamps to resolve conflicts.
Sharding
Sharding, also known as partitioning, is also based on the principle of moving data across multiple nodes. Unlike replication, sharding involves splitting the data rather than making copies. Database sharding divides the entire dataset into multiple groups known as shards. Once divided, each shard can be stored independently, usually on multiple servers that are often referred to as a cluster. Each shard can be accessed independently, which means you can access data faster and have more resources available for processing, computing, and storage.
Sharding enables faster performance but also introduces greater complexity. This complexity includes the concept of rebalancing, which involves moving data between shards over time in order to keep it distributed evenly.
For more details, refer to this guide on sharding in Couchbase.
Scalability of NoSQL versus relational databases
NoSQL databases are inherently more scalable than relational databases because you can scale them both vertically and horizontally. And they have a distributed architecture designed to handle large volumes of data across multiple servers.
Traditional relational database management systems (RDBMS), such as Oracle, focus on consistency over availability. Inversely, NoSQL databases choose availability over consistency and focus more on supporting higher volumes of users and data. Also, data distribution is more fault tolerant if some nodes go down.
Conclusion
To stay a step ahead of your scaling demands, do regular load testing and choose a database that supports the method of scaling that’s best for your application and business needs. Know that there are compromises required for both horizontal and vertical scaling approaches, such as choosing cost over complexity or uptime over consistency.
Use these resources to learn more about database scaling with Couchbase:
- Multi-dimensional database scaling – detail on Couchbase services, rebalancing, statuses, events, and jobs
- Why choose a NoSQL database? – what NoSQL is, how it works, and what NoSQL databases are good for
- Serverless databases – advantages for developers, data persistence for applications, and applications supported
- Couchbase Capella DBaaS – the easiest and fastest way to begin with Couchbase and eliminate database management