Database Sharding

There are multiple type of sharding, we need to consider doing the easy one first before continue with the hard one. The following order will be from easy to hard.

1. Vertical sharding

A table could be split to smaller table. For example:

Pasted image 20230930112059.png

In this example, we have defined 2 group of fields:

In frequently changes (blue)
Frequently changes (red)

If the red fields in here are updated more frequent, we can split that table out to one in-frequently table and one fre0quently table.

Pros:

reduce the potential lock for querying on the product_basic table
Cons:
Table size still increasing
- To address this problem, we can consider cold storage for archiving

2. Table partitioning

Pasted image 20230930112653.png

One table can create multiple partition for example:

CREATE TABLE sales (
    sale_id serial PRIMARY KEY,
    sale_date date,
    amount numeric
);

CREATE TABLE sales_january PARTITION OF sales
    FOR VALUES FROM ('2023-01-01') TO ('2023-01-31');

CREATE TABLE sales_february PARTITION OF sales
    FOR VALUES FROM ('2023-02-01') TO ('2023-02-28');

As a result, the data will be stored in the partition table. The main table will only be fore the schema and so on.

If you want each table in different location. You can create a table space:

CREATE TABLESPACE my_tablespace LOCATION '/path/to/disk1';

CREATE TABLE sales_january PARTITION OF sales
    FOR VALUES FROM ('2023-01-01') TO ('2023-01-31')
    TABLESPACE my_tablespace;

Pros:

Reduce the table size and potentially can store the table across different disk
Cons:
Still one machine serving the database hence bottle neck at the machine level

3. Range based sharding

Pasted image 20230930115142.png

In here we can shard our table based on range. The sharded table can be located in different machine.

If one machine has a problem, it only affect a particular customer.

You can also choose to shard between different range, for example customer_id, user_id, location_id or so on.

Pros:

Reduce table size
Split the table to different machines, hence off load from one machine
Can still use incremental id as the primary key
Cons:
hotspot — for example if we choose to shard our data based on the key location_id, if everyone stays in the same place then it will create a hotspot.

4. Key-based sharding

Pasted image 20230930115646.png

In here we have to use a hash function. The hashfunction has to be unique enough so that it distribute to different shard equally. We can useConsistent Hashing concept

Pros:

Reduce table size
Off load from one machine
Avoid hotspot
Cons:
Complex to tune the hash function
Cannot use incremental id
No subquery, no joins
Transaction is hard to control — Consider Saga architecture pattern or Two Phase Commit
Need to involve middle engine such as which dependent on them
- apache/shardingsphere: Distributed SQL transaction & query engine for data sharding, scaling, encryption, and more - on any database. (github.com)
- Vitess | Scalable. Reliable. MySQL-compatible. Cloud-native. Database.

As a result, maybt this is more suitable for NoSQL, for Database/SQL is very complicated and has drawback.

IPSec protocols

Concurrency Collection

Reactive Programming

ReactiveStream API

Reactor Core

RxJava

Thread based programming

API

Events

Redis

SpringBatch

SpringBoot

Module API Standard

Listeners

Producers

Database Sharding

1. Vertical sharding

2. Table partitioning

3. Range based sharding

4. Key-based sharding