
Basics
This article is in continuation of my previous article on my journey from relational to distributed database. Also I talked about probabilistic data structure here.
While discussing distributed databases with one of my colleague, I realised that we often miss focusing on basics and then I decided to publish this article in most simplest form. However it’s my long pending article which I already wrote few years ago but never published it.
Let’s talk about LSM tree.
In this type of file system, data will be keep on appending to the existing file i.e. it will not overwrite the data ( like we do with b-trees). Data first will be written in the memory ( known as mem tables ) only upto a defined threshold say 10M and once it reaches the limit, the data will be flushed.
LSM uses the concept of LFS which just append the changes (update/insert/delete) i.e. it won’t replace or change the existing record making it as a good option for heavy write work loads in database scenario.
Log structured means that I am writing some logs without modifying the related previous data and just keep on writing one after the other.
Merge is an algorithm to merge the data which is managing this log data.
Tree means different levels of the data i.e. storing the data in a hierarchical order. We can think of our computer hardisk in which we have partitioned the data into different level . The more upper level (having less data) is faster to access and as we go more deeper i.e. lower level is time consuming task (containing more data )
LSM works on different levels which means it maintains data at different levels. Lets try to understand by referring below points.
1st stage and later in writing
1st stage is the memory stage wherein we are having memtables or the memory area ( have explained in later section ). Here data needs to be in the sorted order as per the primary key defined in the table
L0 – level at the storage which contains the data being flushed from the memtable to a stable file in a sorted order. This level contains various sstables ( i.e sorted string tables and known as SST ) and in this level we may experience some duplicate keys (for example a,b,c in s1 table and a,b,c,d is s2 table wherein a,b,c are duplicate keys) . This duplication occurs because of the plain dump received from the memtable and any duplicate data will be deleted at the later stage as a background task.
The SSTable files at Level 1 and above are organized to contain only non-overlapping keys by design.
Have you also heard the word COMPACTION ? let’s understand, what’s that .
COMPACTION is a process wherein SST from level 0 get further combined to form lesser number of SST by removing the overlapping keys/deleted data/ data which has been changed. Obviously this process consume CPU.
What about read operation in LSM tree ?
For read purpose LSM uses binary search algorithm ( which i’m not going to cover in detail as it would be out of context). If we don’t run compaction, it will eventually be impacting the read performance as you can see that the seek will have to go through multiple levels of SSTs.
Did I mention about memtable ?
MemTable is an in-memory data structure holding data before they are flushed to SST files (on disk). It serves both read and write – new writes always insert data to memtable, and reads has to query memtable before reading from SST files, because data in memtable is newer. Once a memtable is full, it becomes immutable and replaced by a new memtable. A background thread will flush the content of the memtable into a SST file, after which the memtable can be destroyed
What all databases uses this ?
Databases like MyRocks, TiDB, Cassandra often uses LSM tree.
Im keeping this article short and concise without writing much more detail as purpose for this article is to make user understand the basics. Writing more content will defeat its purpose.






You must be logged in to post a comment.