LSM Tree (Log Structured Merge Tree)

  1. In coming files write to a Memtable which ordered by objected key and implemented using a balanaced binary tree
  2. When the mem table is over a certain size, it flushed to dish and store as immutable SSTable
  3. LSM Tree created based on multiple SSTable with each one is sorted within its table

Pasted image 20230918194429.png

Overwrite key

Since SSTable is immutable, when updating a value, it doesn't update the previous record in an SSTable. It will append a new data point on the new SSTable which overwrite the old SSTable entries

Pasted image 20230918194618.png

Delete key

It marks an object as a tombstone, when we encounter a key with a tombstone, we know that that key is deleted.

Pasted image 20230918194729.png

Read key

We first look into the Memtable, if it's not there, look into the most recent SSTable in the LSM Tree (Log Structured Merge Tree):

Pasted image 20230918194820.png

Since the SSTable is sorted, look up is very efficient

Going throguh table by table takes a lot of IO. Many system keeps track of a Summary table which keep track of the min/max range of each disk block in every level to skip searches.

To check if there is a key exists. At each level, we keep a Bloom Filter to keep track of the level so that the system can skip a level entirely.

Merging SSTables

Since if we keep adding the SSTable, it will be very inefficient to look up data, and it keeps a stale data frequently.

As a result, we can merge the SSTable after a certain level. There is a separate background process handle the merging.

Pasted image 20230918195025.png

Since SSTable are sorted, the merging algorithm is simple and effective.

Compaction strategy

  • Tiering merging:
    • Used by Cassandra
    • Optimised for write through-put
  • Leveling merging
    • Used by RocksDB
    • Optimised for read through-put