What’s changed in building of Innodb’s b-tree indexes in MySQL 8

IN INNODB , EVERYTHING IS DESIGNED IN B-TREE.  When I say everything I mean primary key, Indexes, secondary index.

But have we ever thought how MySQL creates an index ? Obviously I am not saying by syntax “create index” or “alter table”. Let’s begin to start from an example by what I mean. Say, I have to create an index on table T1 having column ID with million of records in it. In prior version, mysql uses insert APIs and then index entries were inserted into the B-tree one by one. This is a traditional method to insert data in b-tree .

In the old method :

  1. First b-tree CURSOR (btr0btr.c) gets open .
  2. Search to find the correct position begins.
  3. If there is a space in page, then insert this entry into the b tree page and this has been done in optimistic locking.
  4. If there is no space in page , then pessimistic insert would be performed .For this , b-tree cursor (btr0btr.c) will need to open again and split the page and merging will be done.

This is a top down approach and it is costly because of multiple opening of b-tree cursor, searching, splitting and merging of b-tree nodes

Bottom up approach :

Lets divide new approach in 3 phase to understand :

  1. Run phase:

a.  In the run phase, Clustered or Primary index has been scanned or read, post which these entries get added to the buffer , known as sort buffer (innodb_sort_buffer_size) with default size of 262144 bytes. Until this, sorting hasn’t been done. So, you can tune this variable according to your data size to speed up the index creation. Once this buffer becomes full , all the records will be sorted and put it in a temporary file (which is explained further in the second part of this process). In addition to sort buffer size, max_sort_length also needs to be modified whenever we are making changes in sort buffer.
Remember here, I am talking about the sort_buffer_size , not the innodb_sort_buffer_size. However, when dealing with innodb engine and while creation of the secondary index online, innodb_sort_buffer_size should be consider

mysql> show global variables like '%sort_buffer%';

+-------------------------+---------+

| Variable_name           | Value   |

+-------------------------+---------+

| innodb_sort_buffer_size | 1048576 |

| myisam_sort_buffer_size | 8388608 |

| sort_buffer_size        | 262144  |

+-------------------------+---------+

3 rows in set (0.01 sec)

mysql>

2. Second phase :

Multiple temporary files, which got generated in run phase will get merged in this phase using MERGE SORT algorithm. At this stage, multiple temporary files got merged with each other. The amount of merge can be seen under the merge_passes status shown below. This status tells how many temporary files are getting merged and if this number is high enough, then it simply means that the sort buffer size is low and is a good time to tweak it.

mysql> show global status like '%passes%';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| Sort_merge_passes | 0     |
+-------------------+-------+
1 row in set (0.11 sec)

3. Third phase :

Once all file got merged , all the sorted records from second phase will be inserted into the b-tree nodes.


What’s next ?

Until this point, all these operations i.e. sort and merge is single threaded but starting from MySQL 8, these operations are now multi threaded. Think about a situation wherein you are trying to add a secondary index in your existing table using online DDL approach and you need to get it done faster or you are trying to count the number of records in a heavy table.

Now, we have two variables :

1. innodb_parallel_read_threads – Remember this is only applicable for the clustered index or PK.
2. innodb_ddl_threads – Remember this is only for the secondary index.
3. innodb_ddl_buffer_size

Take an example below to understand, how innodb_parallel_read_threads is making the clustered index read more faster as we increase the number of threads.

mysql> set local innodb_parallel_read_threads=1;
Query OK, 0 rows affected (0.00 sec)

mysql> select count(1) from fact;
+----------+
| count(1) |
+----------+
| 14531072 |
+----------+
1 row in set (2.12 sec)

mysql> set innodb_parallel_read_threads=4;
Query OK, 0 rows affected (0.00 sec)

mysql> select count(1) from fact;
+----------+
| count(1) |
+----------+
| 14531072 |
+----------+
1 row in set (0.92 sec)

mysql> set innodb_parallel_read_threads=1;
Query OK, 0 rows affected (0.00 sec)

mysql> select count(1) from fact;
+----------+
| count(1) |
+----------+
| 14531072 |
+----------+
1 row in set (2.09 sec)

However I don’t see any influential improvement in check tables.

mysql> show variables like '%parallel%';
+------------------------------+----------+
| Variable_name                | Value    |
+------------------------------+----------+
| innodb_parallel_read_threads | 1        |
| replica_parallel_type        | DATABASE |
| replica_parallel_workers     | 0        |
| slave_parallel_type          | DATABASE |
| slave_parallel_workers       | 0        |
+------------------------------+----------+
5 rows in set (0.01 sec)

mysql> check table fact;
+------------+-------+----------+----------+
| Table      | Op    | Msg_type | Msg_text |
+------------+-------+----------+----------+
| ankit.fact | check | status   | OK       |
+------------+-------+----------+----------+
1 row in set (17.53 sec)

mysql> set innodb_parallel_read_threads=4;
Query OK, 0 rows affected (0.00 sec)

mysql> check table fact;
+------------+-------+----------+----------+
| Table      | Op    | Msg_type | Msg_text |
+------------+-------+----------+----------+
| ankit.fact | check | status   | OK       |
+------------+-------+----------+----------+
1 row in set (14.23 sec)

mysql> set innodb_parallel_read_threads=8;
Query OK, 0 rows affected (0.00 sec)

mysql> check table fact;
+------------+-------+----------+----------+
| Table      | Op    | Msg_type | Msg_text |
+------------+-------+----------+----------+
| ankit.fact | check | status   | OK       |
+------------+-------+----------+----------+
1 row in set (14.48 sec)

mysql> set innodb_parallel_read_threads=12;
Query OK, 0 rows affected (0.00 sec)

mysql> check table fact;
+------------+-------+----------+----------+
| Table      | Op    | Msg_type | Msg_text |
+------------+-------+----------+----------+
| ankit.fact | check | status   | OK       |
+------------+-------+----------+----------+
1 row in set (14.35 sec)

mysql> show variables like '%parallel%';
+------------------------------+----------+
| Variable_name                | Value    |
+------------------------------+----------+
| innodb_parallel_read_threads | 12       |
| replica_parallel_type        | DATABASE |
| replica_parallel_workers     | 0        |
| slave_parallel_type          | DATABASE |
| slave_parallel_workers       | 0        |
+------------------------------+----------+
5 rows in set (0.00 sec)

mysql> set innodb_parallel_read_threads=20;
Query OK, 0 rows affected (0.00 sec)

mysql> check table fact;
+------------+-------+----------+----------+
| Table      | Op    | Msg_type | Msg_text |
+------------+-------+----------+----------+
| ankit.fact | check | status   | OK       |
+------------+-------+----------+----------+
1 row in set (14.34 sec)

mysql> 

This phase is while scanning the clustered index. Similarly if you are using the 8.0.27, innodb_ddl_threads can be used to speed up the sort and merge operations as explained one. This defines the number of threads to perform sort and merge operation. Here is the below test I performed :

mysql> alter table fact add index idx_dim2(dim2);
Query OK, 0 rows affected (19.41 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> set innodb_ddl_buffer_size=10485760;
Query OK, 0 rows affected (0.00 sec)

mysql> set innodb_ddl_buffer_size=104857600;
Query OK, 0 rows affected (0.00 sec)

mysql> set innodb_ddl_threads=10;
Query OK, 0 rows affected (0.00 sec)

mysql> alter table fact drop index idx_dim2;
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> alter table fact add index idx_dim2(dim2);
Query OK, 0 rows affected (16.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> set innodb_ddl_threads=12;
Query OK, 0 rows affected (0.00 sec)

mysql> set innodb_parallel_read_threads=12;
Query OK, 0 rows affected (0.00 sec)

mysql> alter table fact drop index idx_dim2;
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> alter table fact add index idx_dim2(dim2);
Query OK, 0 rows affected (15.89 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> set innodb_ddl_buffer_size=1048576000;
Query OK, 0 rows affected (0.00 sec)

mysql> set innodb_parallel_read_threads=16;
Query OK, 0 rows affected (0.00 sec)

mysql> set innodb_ddl_threads=16;
Query OK, 0 rows affected (0.00 sec)

mysql> alter table fact drop index idx_dim2;
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> alter table fact add index idx_dim2(dim2);
Query OK, 0 rows affected (17.73 sec)
Records: 0  Duplicates: 0  Warnings: 0

With the various combinations, you can see how the performance varies.


What other things needs to consider when creating index in MySQL ?

Now , question is how much space can we reserved for index future growth per page. To achieve this, we have **innodb_fill_factor** option which will help in reserving the space for future b-tree index growth. For an example, if we have set this value to 70, then 30 percent of the space of page will get reserve for index growth during sorted index build.

This setting help us in managing server memory . Sometimes, when data becomes fragmented , setting this option can help in re-gaining it.

Conclusion

This article helps us to understand the core process behind how innodb creates an index and how with the time, things has been improved in 8. Here I have used 2 versions of MySQL to show the results. One is 8.0.26 and 8.0.32 on Mac OS.

Other articles

How one space can create a nightmare for you…

As you know that MySQL 5.7 is going to be in its end of life this month (when I wrote this article) and many businesses are upgrading to 8 ( except few ones, believe me its true) . Many cloud vendors are providing extra support and time to their customer so that they can move…

tpc-c Benchmarking of MySQL 8.0.44

This setup is on my local machine for mac with 8.0.44. M2 chip, memory 16G High level Results : total transaction executed in 10 min – 247298Transaction per minute – 24729Time taken on average for every case – 0-26ms ( you have to do this calculation , testing wont tell you this. I have done…

Some very basics of LSM tree….

This article is in continuation of my previous article on my journey from relational to distributed database. Also I talked about probabilistic data structure here. While discussing distributed databases with one of my colleague, I realised that we often miss focusing on basics and then I decided to publish this article in most simplest form.…

MySQL Innodb – Internal Transaction flow

The below picture is just a representation of the flow. I created this figure by referring to the picture in percona doc but I don’t follow this picture anymore (I didn’t mean It was wrong) and have created my own one now. You can refer to my video I posted link in next paragraph

IMG_20180213_183652250

I keep on updating this document with the changes or correctness. Last updated on 1st December 2021.

I spoke on InnoDB transaction flow in more detail. Please check here :

Hope you have read my earlier posts as well. If not, you can read those as well (mentioned last ). Also while working on these features, I have found many bugs which I have reported to Oracle as well. One of the bugs, I reported is (https://bugs.mysql.com/bug.php?id=89158). I will try to cover such limitations in my next blogs.

In this blog, I am covering transaction flow and how it can improve DB performance. This blog has been written after reading various MySQL documentation, related snippet code, and joining all the blocks which can make some sense.

At the time of writing, there is no proper document from Oracle which can define this.

So, before going further, it is good if we understand the following terminology :

  1. Buffer pool – Cache area of holding cached data . It is main part of MySQL and we allocate usually according to our business requirement.
  2. log buffer – buffer for redolog files which keeps the data coming from MTR buffer.
  3. redo logs – iblogfile
  4. undo logs – undo files
  5. tablespaces – ibdata
  6. data files – ibd
  7. check points
  8. fuzzy check point
  9. sharp check point
  10. flush
  11. overflow
  12. commit
  13. binary logs
  14. sync binlog
  15. innodb adaptive flushing.
  16. MTR transaction.
  17. MTR buffer
  18. Flush list
  19. Flushing methods
  20. double write buffers
  21. Change buffers
  22. list – double linked list

Let us try to understand this in the following way :

  1. Select – When a user execute select query , data comes from table space and get loaded in the buffer pool  . So at this point of time , complete data from table space ( i mean ibd file) doesn’t get loaded in buffer pool and only  “data matching criteria” gets loaded ( also much depends on the what read ahead algorithm have been used). By matching criteria I meant, only required pages get loaded and root of the index is always in cache for open table. Here number of read io threads, type of read ( linear,random) and threshold value matters which I may try to cover in my future post as a part of read optimisation.

2.  DML statement – In such cases, suppose we received an “insert into table_name” , it goes to log buffer and buffer pool simultaneously . After log buffer, it goes to log file ( which is redo log files and knows as  ib_logfile0,1). Journey of this transaction from log  buffer to log file will depend on innodb_flush_log_at_trx_commit. If it is 1 then at each transaction commits , tx will write and get flush to redo log (log file) .With this value , we are ensuring that chances of loss for any transaction is minimal in case of any crash and thus this value is safest . But I/O will be high due to write at each tx commit . To flush this data into disk ,OS will take care when to flush . Also , to flush this data into disk , innodb_flush_log_at_time_out (which is in seconds ) plays important role . By default , it is 1 second and at every second data got “flushes to disk” . Please note by  “flushes to disk” I don’t mean data gets written to data file , rather i mean ib_logfile using different flushing method. I will try to cover these 2 parameter in my future post but you can refer mysql documentation on this. So , you can try using different values for these parameters and choose what suits your MySQL server.

In start of this , i mentioned that , data goes in buffer pool as well . Buffer pool load a page from tablespace and also uncommitted version of data (to ensure MVCC) which is stored in undo tablespace .By uncommitted version , i mean the historical data (in case of update,delete so that we can rollback from and ensuring high performance) .This undo tablespace gets loaded in buffer pool and ensure that MVCC practice must be followed. Once this transaction get committed , dirty pages will get flush to “data file” from buffer pool (which now host the commited version of data instead of undo) and thus marks the check point.By data files , i mean where your data resides i.e. *.ibd files . This process of flushing the data ( dirty pages ) ,is known as fuzzy check point.

In reality , writing to disk gets more complex as we proceed further because Check point, overflow and commit follows different threads.I will try to make you understand in short and descriptive manner :

  1.  Check point : At various intervals , innodb marks check point in buffer pool for dirty pages to flush out , preventing buffers to get filled up.
  2. Over flow – This is the rare case when buffers get fill and innodb needs to make space . If log buffer is filled , then it will over write the LRU data and if data buffer is full , then it compares the LSN of data buffer with log buffer and if it is greater then , innodb will write the log records.
  3.  Commit : By just commit , it never mean that data will write dirty pages to disk.  We need to remember that at very first point of time , data goes write in redo log first instead of data files because even if we face crash , data resides in redo files and can be recovered from that and this process is knows as WAL ( Write ahead logging )

Earlier blogs.

Function based Indexing

Is mysql acid compliant ?

Making index creation faster in MySQL

Please feel free to correct me , if i have pointed out anything wrong as it is simply my understanding .

About Ankit Kapoor

Ankit Kapoor is a database geek and enjoy playing with open source databases like MySQL and PG.
Connect here on linkedin

Function-based indexing in MySQL 5.7

I have rarely seen dev/DBA using this concept when their query is busy in FTS or index scanning. I have decided to write my own experience here and share it with the outer world. It might help someone who is struggling with their query performance.

In many cases, you have experienced slowness in your select queries even instead after having had an index on the column present in where clause and you are using an in-built function like substring, like, max, min, etc. Please note I’m talking about inbuilt functions only not user-defined functions.

show create table list:

Create table (id int, name char (20)),primary key (Id) ,Key idx_name(name) engine=Innodb;

Now insert at least 1 million records so that we can easily understand the performance impact :

Now if your query is like :

select id,name from list where id=123 and name =’oggy’;

This query will use index idx_name and will give you output in less than a second.

But if your query is like ;

select id, name from list where id=123 and name like ‘%abc%’;

With more than one million records this query will perform FTS (full table scanning) on the name column and will not use the index in spite name column has a secondary index on it and this may take a lot of time depending on the number of records in the table.

Fact is function never use indexes, even if involved column has index created on it and always result in FTS.

Prior to MySQL 5.7, MySQL doesn’t provide any functional-based indexing feature or something similar to it.

In 5.7, we have virtual columns feature which is known as generated columns and can be built at the time of reading data, thus resulting in no consumption of storage space.

Let’s begin with our earlier example where our query was doing FTS. Now if i create a virtual column :

alter table list add name_virt char (10) as substring(name,3,9);

So instead of evaluating at the application level, DB will take care of this. Moreover, we can create an index on these virtual columns. So, if I create an index on this newly created column( I am trying making use of the online DDL, you can choose your own) :

create index idx_name_virt on list(name_virt) algorithm=inplace lock=none;

Once the index has been added, please run the same select query and you will observe that instead of consuming more than 4-5 minutes (suppose), your query will yield a result in less than a second.

If you want to store these values in the table, then you can use the STORAGE keyword while adding a column. But by default, it remains virtual. In my experience, my queries that were taking more than 4 seconds are now taking millisecond only.

Please be informed that performance depends on how heavy evaluation we are doing. If we are doing heavy evaluation then performance may get impacted because of several IO. So we need to decide as per our requirements.

Below is the clause of virtual column from mysql docs:

col_name data_type [GENERATED ALWAYS] AS (expression) [VIRTUAL | STORED] [NOT NULL | NULL] [UNIQUE [KEY]] [[PRIMARY] KEY] [COMMENT string‘]

About Ankit Kapoor

Ankit Kapoor is a database geek and enjoy playing with open source databases like MySQL and PG.
Connect with me on linkedin

Is MySQL ACID compliant ?

I understand it’s a bit of an old topic but people still have doubts. However, I have published an answer on this topic here, but now I feel to make my answer more connected with people with more improved thought.

There is already a lot of debate on this whether MySQL completely follows ACID properties, and everyone has their own opinion. MySQL docs also don’t convey complete information except for the glossary here.

MySQL with InnoDB engines closely follows the ACID property.

Let me try to put all properties one by one how MySQL with Innodb follows these properties :

  1. Atomicity says that either rollback or commit in case any part of a transaction fails. As per my experience, InnoDB engines do crash recovery in case of OS reboot, power failure, etc or it rolls back whenever any crash occurs in mid of transaction. So in case, if any uncommitted transaction got failed due to any reason, I.e. not flushed (reached) to disk yet, InnoDB roll back the transaction . Hence I can say it is ATOMIC. Obviously, the redo log plays an important role here in re-executing the failed transactions.
  2. Consistency says that the data must always remain in a consistent state. DB must behave as per requirement. In simpler words only required data is allowed to insert and must complete. Also, the various constraints must be followed and should not get changed during the course of a transaction. InnoDB has all the constraints like Foreign keys, primary key etc. and doesn’t allow unauthorized values to be inserted.
  3. Isolation says that integrity should be maintained and data should not come in inconsistency due to concurrent transaction. .Innodb provides several row level locking which avoids (if properly handled) any other process to acquire lock on resource which is already in use by other process. Innodb follow all isolation i.e. serialized, repeatable read,read committed and read uncommitted. So we can set it as per our requirement.
  4. Durability : As per MySQL DOC the durability aspect of the ACID model involves MySQL software features interacting with your particular hardware configuration. Because of the many possibilities depending on the capabilities of your CPU, network, and storage devices, this aspect is the most complicated to provide concrete guidelines for. (And those guidelines might take the form of buy “new hardware”.) . For more clarification, innodb maintains double write buffers which ensures that in case of crash , data will not be lost and can be recovered from it after DB comes up.

Hope this help. Post your comment for any queries.

About Ankit Kapoor

Ankit Kapoor is a database geek and enjoy playing with open source databases like MySQL and PG.
Connect with me on linkedin

How to make create index and drop index faster with heavy table without locking

Previously ,whenever we needed to create index or alter a heavy Innodb table’s definition in production , we always ended up with lot of time wastage and down time. This is because MySQL usually acquire exclusive lock on this table till alteration get completed and don’t permit other DML operation to execute on same table.Also while doing this mysql do “copying to tmp table” operation which is really lot of time consuming and night mare for all of us because it is not possible to execute same in production.

However , prior to MySQL 5.6 we have plugins to achieve same but a overhead to activate it.

With current version , we have capability to avoid “copy to tmp table” operation using arguments like ALGORITHM and we can avoid locking using LOCK=NONE.

So now we can use:

CREATE INDEX index_name ON table_name (column_name) ALGORITHM=IN PLACE LOCK=NONE;

Let me explain you. Today I have created a b tree index on innodb table with size on disk 15 GB and have record size 30 millions.

On simple alter or create index/drop index , index creation/drop was taking 18-20 minutes and also blocking other DML operation to execute .

After supplying above mentioned argument create index took 2 min 23 seconds and for drop , it took 0.11 seconds. It was a huge difference.

This is because only meta data of this table has been changed and is accessible by other DML operation too.

Let me know , if it is useful for you with time stats and cpu stats.

Regards

Ankit