Log-Structured Merge Tree is a Data Structure optimized for write-heavy workloads, used by storage engines like Cassandra.
When a write comes in, we add it to an in-memory balanced tree data structure (for example, a red-black tree), often called a memtable.
When the memtable gets bigger than some threshold (e.g. a few megabytes), we write it to disk as an SSTable file, which is a immutable, write-once file containing sorted strings (SSTable == sorted string table).
When a read request comes in, we first check the memtable, then if we don't find the key, we move on to checking the youngest SSTable, and then proceed looking at next SSTable until we find the key.
To keep the process more efficient, in the background we will merge and compact (discard overwritten or deleted values) older SSTables into new ones, so there is less places to check.
Status: #📥
References: