LSM Tree (Log Structured Merge Tree)
- In coming files write to a Memtable which ordered by objected key and implemented using a balanaced binary tree
- When the mem table is over a certain size, it flushed to dish and store as immutable SSTable
- LSM Tree created based on multiple SSTable with each one is sorted within its table
Overwrite key
Since SSTable is immutable, when updating a value, it doesn't update the previous record in an SSTable. It will append a new data point on the new SSTable which overwrite the old SSTable entries
Delete key
It marks an object as a tombstone
, when we encounter a key with a tombstone
, we know that that key is deleted.
Read key
We first look into the Memtable, if it's not there, look into the most recent SSTable in the LSM Tree (Log Structured Merge Tree):
Since the SSTable is sorted, look up is very efficient
Going throguh table by table takes a lot of IO. Many system keeps track of a Summary table which keep track of the min/max range of each disk block in every level to skip searches.
To check if there is a key exists. At each level, we keep a Bloom Filter to keep track of the level so that the system can skip a level entirely.
Merging SSTables
Since if we keep adding the SSTable, it will be very inefficient to look up data, and it keeps a stale data frequently.
As a result, we can merge the SSTable after a certain level. There is a separate background process handle the merging.
Since SSTable are sorted, the merging algorithm is simple and effective.
Compaction strategy
- Tiering merging:
- Used by Cassandra
- Optimised for write through-put
- Leveling merging
- Used by RocksDB
- Optimised for read through-put