this is how vector indexes are faster than b-tree where it doesn't do comparison with each node

What type of databases are well suited for LLM ? part 1

This article is in continuation of my previous article related to Journey of relational to distributed and LSM. This article will often contain below words. I will try to cover its actual meaning as they will come. This article also assumes that you know about LLM ( like chatgpt) models –

  • Large language models
  • TiDB (vector supported distributed database)
  • high dimensional data
  • vector
  • semantic search
  • knowledge retrieval
  • RAG( Retrieval augmented generation )
  • Euclidean distance
  • cosine similarity
  • ANN
  • High dimensional indexes like HNSW
  • methods to perform similar searches

Let us try to understand from a use case which I was discussing with my colleagues a few days ago, and then I thought to put it down in my article in a very simple language so that everyone can understand it.

If I ask, what are the attributes of your favourite car, what will you answer ? And also let’s allocate a random number to every attribute instead of a name :

Let’s think about Honda City of white colour. What things does my Honda City have?

  1. HP of engine : 1500
  2. Car colour : 0001 ( white colour)
  3. Car length : 75
  4. Car Width : 40
  5. Number of gears : 6
  6. Cruise feature : 1
  7. Maximum speed : 400 (Kmph)
  8. Pickup speed :60
  9. Seat colour : 010
  10. Ambience light : 101
  11. Airbags :6
  12. Speakers :4
  13. Twitter : 2

    How I selected these attributes and allocated these numbers ?

Now how I selected these attributes and on what basis I allocated these numbers is a separate discussion, which is related to training models but for now you can think that it’s random and there is no such rule that only specific numbers can be provided to a specific attribute.

Going back to the topic, if I combine all these numbers, it looks like a list of numbers :

{1500,0001,75,40,6,1,400,60,010,101,6,4,2}

This becomes a vector representation of my Honda City, and every attribute refers to a dimension. If I keep on adding more attributes of the car, it becomes high-dimensional data, i.e it has got high number of attributes.

Do I need a separate data type for this ?

Yes, like varchar, we do have a VECTOR data type. See below. Please note that database I used here is TiDB

mysql> show create table cars;
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                         |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------+
| cars  | CREATE TABLE `cars` (
  `name` char(10) DEFAULT NULL,
  `car_vector` vector DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

Is this data type available in traditional databases ?

Storing these array or list of numbers or vector in traditional databases like Postgres/MySQL as varchar , is possible, but if I try to find similar products then I can’t perform a similarity search. One obvious question comes to mind about using “Like operator“, but that performs text-based searching on a specific pattern and can’t give similar data.

Let’s go into more detail on implementing it in traditional databases and understand why it won’t work-

Let’s add one more car. Say, you have a Red Hyundai Verna whose vector representation can be –

verna – {1500,0011,75,40,6,1,300,50,100,110,4,3,1}

and my city was – {1500,0001,75,40,6,1,400,60,010,101,6,4,2}

If we perform the euclidean distance ( to search for similar products ) then we will see that both items are quite close and thus can be called similar. I can use multiple mathematical functions to do this, but the real problem is SCALE. You cannot handle multiple such operations at the scale of such a million records.

mysql> insert into cars values ('city_sedan','[1500,1,75,40,6,1,400,60,10,101,6,4,2]');
Query OK, 1 row affected (0.02 sec)

mysql> insert into cars values ('vern_sedan','[1500,1,79,40,6,1,450,60,10,101,3,2,1]');
Query OK, 1 row affected (0.01 sec)

That’s where the vector database knocks in ? Can we use TiDB, which is a distributed database and supports vector datatype with HNSW, ANN , euclidean distance and cosine ?

They calculate the similarity search using multiple methods, but not limited to below.

  1. Search on the same 2D-3D axis or multi-dimensional axis.
  2. Magnitude search ( not only subtraction of numbers ) i.e cosine search

Let’s try to perform some search using euclidean distance in TiDB, which works on below fundamental.

(vectorA1- vectorA2)^2 +  (vectorB1- vectorB2)^2
sqroot(vector A + vector B)

Think about a situation wherein you need to find a car, which is similar to your given vector input. I have decided to give an input of high end speed cars –

mysql> SELECT name,VEC_L2_DISTANCE(car_vector,'[5000,10,19,400,60,10,330,600,100,1001,30,20,10]') from cars;
+------------+--------------------------------------------------------------------------------+
| name       | VEC_L2_DISTANCE(car_vector,'[5000,10,19,400,60,10,330,600,100,1001,30,20,10]') |
+------------+--------------------------------------------------------------------------------+
| city_sedan |                                                             3674.4128782704865 |
| vern_sedan |                                                             3675.8008651176956 |
+------------+--------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

Above is the distance of the input from the available vectors. Now here we can do the filtering :

mysql> SELECT name,VEC_L2_DISTANCE(car_vector,'[5000,10,19,400,60,10,330,600,100,1001,30,20,10]') from cars where VEC_L2_DISTANCE(car_vector,'[5000,10,19,400,60,10,330,600,100,1001,30,20,10]') < 100;
Empty set (0.00 sec)

This means there is no car we have which is near 100 distance. This value of 100 is specific to the need that at what threshold we want to setup our similar search.

Not all databases which support vector datatype can be called as fully vector DB.

For example MySQL 9 supports VECTOR as a data type, but still it needs time to get mature as it can’t be completely called as vector database because it still cant scale well to millions and billions of records and also the absence of required indexes for vector search makes it less efficient for billions of search. Moreover, what MySQL supports as a part of the method to perform similarity search is Euclidean distance but not COSINE support ( which is good for DNA search ).

So what we learnt until now –

  1. What is similarity search ?
  2. How these searches are being implemented ?
  3. Why traditional databases are not efficient for these purposes even though they are supporting it ?
  4. What is vector and methods to do it ?
  5. Indexes for vector data types

Can’t we use B-tree indexes on these data types ?

We all worked on Btree, but it is not well suited for similarity search at such a large scale, and this is where HNSW search knocks in, and it is blazing fast because the amount of comparison is very, very less, and it doesn’t focus on exact match but similar by creating layers.

So now going back to our question “what databases are well suited for LLM” . LLM ( like chat gpt or deepseek) uses RAG to get more data from knowledge, and this knowledge needs to be stored, and this store can be vector.

In 2nd phase of this article, I will do the implementation on TiDB to see how it actually works.

Some very basics of LSM tree….

Basics

This article is in continuation of my previous article on my journey from relational to distributed database. Also I talked about probabilistic data structure here.

While discussing distributed databases with one of my colleague, I realised that we often miss focusing on basics and then I decided to publish this article in most simplest form. However it’s my long pending article which I already wrote few years ago but never published it.

Let’s talk about LSM tree.

In this type of file system, data will be keep on appending to the existing file i.e. it will not overwrite the data ( like we do with b-trees). Data first will be written in the memory ( known as mem tables ) only upto a defined threshold say 10M and once it reaches the limit, the data will be flushed.

LSM uses the concept of LFS which just append the changes (update/insert/delete) i.e. it won’t replace or change the existing record making it as a good option for heavy write work loads in database scenario.

Log structured means that I am writing some logs without modifying the related previous data and just keep on writing one after the other.

Merge is an algorithm to merge the data which is managing this log data.

Tree means different levels of the data i.e. storing the data in a hierarchical order. We can think of our computer hardisk in which we have partitioned the data into different level . The more upper level (having less data) is faster to access and as we go more deeper i.e. lower level is time consuming task (containing more data )

LSM works on different levels which means it maintains data at different levels. Lets try to understand by referring below points.

1st stage and later in writing

1st stage is the memory stage wherein we are having memtables or the memory area ( have explained in later section ). Here data needs to be in the sorted order as per the primary key defined in the table

L0 – level at the storage which contains the data being flushed from the memtable to a stable file in a sorted order. This level contains various sstables ( i.e sorted string tables and known as SST ) and in this level we may experience some duplicate keys (for example a,b,c in s1 table and a,b,c,d is s2 table wherein a,b,c are duplicate keys) . This duplication occurs because of the plain dump received from the memtable and any duplicate data will be deleted at the later stage as a background task.

The SSTable files at Level 1 and above are organized to contain only non-overlapping keys by design.

Have you also heard the word COMPACTION ? let’s understand, what’s that .

COMPACTION is a process wherein SST from level 0 get further combined to form lesser number of SST by removing the overlapping keys/deleted data/ data which has been changed. Obviously this process consume CPU.

What about read operation in LSM tree ?

For read purpose LSM uses binary search algorithm ( which i’m not going to cover in detail as it would be out of context). If we don’t run compaction, it will eventually be impacting the read performance as you can see that the seek will have to go through multiple levels of SSTs.

Did I mention about memtable ?

MemTable is an in-memory data structure holding data before they are flushed to SST files (on disk). It serves both read and write – new writes always insert data to memtable, and reads has to query memtable before reading from SST files, because data in memtable is newer. Once a memtable is full, it becomes immutable and replaced by a new memtable. A background thread will flush the content of the memtable into a SST file, after which the memtable can be destroyed

What all databases uses this ?

Databases like MyRocks, TiDB, Cassandra often uses LSM tree.

Im keeping this article short and concise without writing much more detail as purpose for this article is to make user understand the basics. Writing more content will defeat its purpose.

Journey from Relational to Distributed databases

It has been a while since I wrote my practical database experiences I gained in this last year. I have mentioned “Journey” because things change when you are accustomed to relational databases ( MySQL, PG) but start adopting distributed databases ( like TiDB, ScyllaDB) regardless of any reason. We need a basic understanding while doing this migration. However, this article won’t be covering details of these 2 because there are already lot many articles out there that help in understanding the differences and use cases of both.

You already know relational databases

binary digits here represent the physical storage

So you already know a few things about it :

  • Entire data is stored in a single machine unless you are not using sharding.
  • You replicate data using binary logs, WAL ( redo), archiving, or some tools that can read these files.
  • Machines do not talk with each other if one in a cluster goes down.
  • No way of tracking consistency across nodes ( except by relying on semi-sync replication).
  • Data is stored in b-tree storage format.

You might know about Distributed database

You might know about these pointers :

  • Data is distributed on multiple machines rather than residing on a single machine.
  • In the above figure, Tables A1 and A11 are subparts of the same table A wherein A1 hold some data and A11 holds some. The same holds for other tables B and C
  • Every table has at least 2 copies in different machines. This is called the replication factor.
  • This is needed if in case 1 machine goes down.
  • Machines talk to each other using consensus algorithms. Some widely known ones are Paxos and Raft.
  • No such concept of a single leader and followers i.e. there are leaders and there are followers since a chunk of data is residing at several machines.
  • Consistency has to be maintained using Quorum and majority decisions.
  • Data is stored in LSM tree format.

How was the journey from relational DB until now

It started with Cassandra but since it is NoSQL and this article is focused on relational, I will talk specifically about relational databases like MySQL. Hence considering TiDB here. The below logs are from my testing machine.

ankitkapoor@Ankits-MacBook-Air bin % ./mysql -h 127.0.0.1 --port 56210 -u root -p --local-infile=1
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3556769810
Server version: 8.0.11-TiDB-v8.5.0 TiDB Server (Apache License 2.0) Community Edition, MySQL 8.0 compatible

Copyright (c) 2000, 2024, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> 

First in order to understand how data is distributed I needed to access information schema and metrics Schema

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| INFORMATION_SCHEMA |
| METRICS_SCHEMA     |
| PERFORMANCE_SCHEMA |
| mysql              |
| sys                |
| test               |
+--------------------+

6 rows in set (0.00 sec)

mysql>

This article is not going to cover the architecture.

Let us see, how data is distributed internally :

mysql> select * from information_schema.TIKV_REGION_PEERS;
+-----------+---------+----------+------------+-----------+--------+---------
| REGION_ID | PEER_ID | STORE_ID | IS_LEARNER | IS_LEADER| STATUS|DOWN_SECONDS |
+-----------+---------+----------+------------+-----------+--------+-------+
| 1003 |    1004 |        1 |          0 |         1 | NORMAL |     NULL |
| 1009 |    1010 |        1 |          0 |         1 | NORMAL |     NULL |
|   20 |      21 |        1 |          0 |         1 | NORMAL |     NULL |
|   48 |      49 |        1 |          0 |         1 | NORMAL |     NULL |
+-----------+---------+----------+------------+-----------+--------+-------+

Here Leader is the master and Peers are the replicas which hold the copy of data.

Lets see the data distribution at the table level :

mysql> select * from TABLE_STORAGE_STATS where TABLE_SCHEMA='test'\G;
*************************** 1. row ***************************
      TABLE_SCHEMA: test
        TABLE_NAME: customer
          TABLE_ID: 110
        PEER_COUNT: 1
      REGION_COUNT: 1
EMPTY_REGION_COUNT: 1
        TABLE_SIZE: 1
        TABLE_KEYS: 0
*************************** 2. row ***************************
      TABLE_SCHEMA: test
        TABLE_NAME: tes
          TABLE_ID: 119
        PEER_COUNT: 1
      REGION_COUNT: 1
EMPTY_REGION_COUNT: 1
        TABLE_SIZE: 1
        TABLE_KEYS: 0
*************************** 3. row ***************************
      TABLE_SCHEMA: test
        TABLE_NAME: before
          TABLE_ID: 131
        PEER_COUNT: 1
      REGION_COUNT: 1
EMPTY_REGION_COUNT: 1
        TABLE_SIZE: 1
        TABLE_KEYS: 165
*************************** 4. row ***************************
      TABLE_SCHEMA: test
        TABLE_NAME: after
          TABLE_ID: 135
        PEER_COUNT: 1
      REGION_COUNT: 1
EMPTY_REGION_COUNT: 1
        TABLE_SIZE: 1
        TABLE_KEYS: 0
4 rows in set (0.00 sec)

ERROR: 
No query specified

In the above output, table id is the unique id of a table and is associated with every table and is showing its number of copies. Since it is my testing machine, it is showing only one. Regions are the logical block of data.

Lets try to check if we can find more about how data is being distributed ( although not complete ) with the help of placement policies.

mysql> select * from tables where TABLE_SCHEMA='test'\G;
*************************** 1. row ***************************
             TABLE_CATALOG: def
              TABLE_SCHEMA: test
                TABLE_NAME: customer
                TABLE_TYPE: BASE TABLE
                    ENGINE: InnoDB
                   VERSION: 10
                ROW_FORMAT: Compact
                TABLE_ROWS: 0
            AVG_ROW_LENGTH: 0
               DATA_LENGTH: 0
           MAX_DATA_LENGTH: 0
              INDEX_LENGTH: 0
                 DATA_FREE: 0
            AUTO_INCREMENT: 0
               CREATE_TIME: 2024-12-20 19:23:38
               UPDATE_TIME: NULL
                CHECK_TIME: NULL
           TABLE_COLLATION: latin1_bin
                  CHECKSUM: NULL
            CREATE_OPTIONS: 
             TABLE_COMMENT: 
             TIDB_TABLE_ID: 110
 TIDB_ROW_ID_SHARDING_INFO: NOT_SHARDED(PK_IS_HANDLE)
              TIDB_PK_TYPE: CLUSTERED
TIDB_PLACEMENT_POLICY_NAME: NULL

Until now, I have tried to keep it simple as to how data distribution works and how can we see it in actual. In the next article, we will cover other aspects of distributed systems.

Note – These are my own thoughts and none of my employer has any role in it.

Who said nothing to do when migrating your character set from MySQL 5.7 to 8 ?

As you know that MySQL 5.7 is going to be in its end of life this month and many businesses are upgrading to 8 ( except few ones, believe me its true) . Many cloud vendors are providing extra support and time to their customer so that they can move to 8. One of the major change in 8 is the default character type and collation. Previously the default one was latin1 and also the alias of utf8 was utf8mb3 but now it is utf8mb4. There are plenty of articles from many database service providers that says no data and table structural changes are required. Although it is true to some extent but there are some major changes, if being missed can cause your application to behave weirdly.

Lets go basic first !!

What is a character set ?

A character is a letter, let say A,B,C . Now I want to encode these letters i.e. A can be represented by the ! and B can be represented by the $. If I combine the letter and their encoding it can be called as a character set.

There are total 41 character set in 8.0.33. I wont go in detail of every character set as this is out of topic but still if you are more interested in reading about it, this is a manual. This link will also explain how utf8mb4 is different from the utf8mb3 and other character set and how man bytes does it take to store a single character.

mysql> select version();
+-----------+
| version() |
+-----------+
| 8.0.33    |
+-----------+
1 row in set (0.00 sec)
mysql> select count(1) from information_schema.CHARACTER_SETS;
+----------+
| count(1) |
+----------+
|       41 |
+----------+
1 row in set (0.00 sec)

What is a collation ?

Above I mentioned that if I replace A by ! , then I am imposing rule . This rule is called as a collation. This is a very simple type of collation. Collation can be much more complex, depending on the type of rules one have created.

In total there are 286 collations in 8.0.33. There can be multiple types of collation for one character set and one

mysql> select count(*) from COLLATIONS;
+----------+
| count(*) |
+----------+
|      286 |
+----------+
1 row in set (0.00 sec)
mysql> select version();
+-----------+
| version() |
+-----------+
| 8.0.33    |
+-----------+
1 row in set (0.00 sec)

Each collation has a property which talks about the PAD_ATTRIBUTE i.e. if a string can have trailing space as its own part (without considering it separate) or should it be treated as its additional character. It may sounds confusing now but I will clear it further as this is a main topic of this article.

How pad attribute makes the difference while upgrading ?

In many cases if we were using latin1 or utf8 (which was an alias of utf8mb3) as the character set of our tables in 5.7, most probably we are using latin1_swedish_ci or utf8mb3_general_ci as its collation type because these are the default one and rarely we do change it until unless we have some specific business needs. Now lets talk about a case wherein we have a table with character set as utf8mb3

mysql> show create table check_mb3\G;
*************************** 1. row ***************************
       Table: check_mb3
Create Table: CREATE TABLE `check_mb3` (
  `id` int DEFAULT NULL,
  `name` char(10) DEFAULT NULL,
  UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3
1 row in set (0.01 sec)
mysql> insert into check_mb3 values(1,'ankit');
Query OK, 1 row affected (0.00 sec)
mysql> insert into check_mb3 values(1,'ankit ');
ERROR 1062 (23000): Duplicate entry 'ankit' for key 'check_mb3.name'
mysql> insert into check_mb3 values(2,'guga');
Query OK, 1 row affected (0.00 sec)
mysql> insert into check_mb3 values(3,'guga ');
ERROR 1062 (23000): Duplicate entry 'guga' for key 'check_mb3.name'
mysql> insert into check_mb3 values(4,'victor');
Query OK, 1 row affected (0.00 sec)
mysql> insert into check_mb3 values(5,'victor ');
ERROR 1062 (23000): Duplicate entry 'victor' for key 'check_mb3.name'

Until now, everything is fine and we are not able to insert the same string even if it has trailing spaces. Now lets try to migrate our tables to 8 without worrying about the pad spaces.

mysql> alter table check_mb3 CHARACTER SET UTF8mb4;
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0
mysql> alter table check_mb3 modify name char(10) CHARACTER SET utf8mb4 collate utf8mb4_0900_ai_ci;
Query OK, 3 rows affected (0.03 sec)
Records: 3  Duplicates: 0  Warnings
mysql> show create table check_mb3\G;
*************************** 1. row ***************************
       Table: check_mb3
Create Table: CREATE TABLE `check_mb3` (
  `id` int DEFAULT NULL,
  `name` char(10) CHARACTER SET utf8mb3 DEFAULT NULL,
  UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)
mysql> insert into check_mb3 values(4,'victor ');
ERROR 1062 (23000): Duplicate entry 'victor' for key 'check_mb3.name'
mysql> select * from check_mb3;
+------+--------+
| id   | name   |
+------+--------+
|    1 | ankit  |
|    2 | guga   |
|    3 | victor |
+------+--------+
3 rows in set (0.00 sec)

OOPS !!, we didn’t expect this, but anyhow we are fine in this case when we are using data type as char but lets think of a use case wherein our application expect that every string is ending with a trailing space and we try to insert the data with a trailing space –

mysql> insert into check_mb3 values(7,'ankittf84 ');
Query OK, 1 row affected (0.01 sec)
mysql> select * from check_mb3 where name='ankittf84 ';
Empty set (0.00 sec)

Here our application start behaving weird. However if we do same in utf8mb3 collation, the above read will return the data.

mysql> show create table check_mb3_again;
+-----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table           | Create Table                                                                                                                                                                                                                               |
+-----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| check_mb3_again | CREATE TABLE `check_mb3_again` (
  `id` int DEFAULT NULL,
  `name` char(10) CHARACTER SET utf8mb3 COLLATE utf8mb3_general_ci DEFAULT NULL,
  UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+-----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> insert into check_mb3_again values (1,'ankit ');
Query OK, 1 row affected (0.00 sec)
mysql> select * from check_mb3_again where name='ankit ';
+------+-------+
| id   | name  |
+------+-------+
|    1 | ankit |
+------+-------+
1 row in set (0.00 sec)

But major question is how migration will be impacted when writing the data ?

Lets take another case of below table –

mysql>  CREATE TABLE check_mb3_again_withvarchar (name varchar(10)) character set utf8mb3 collate utf8mb3_general_ci ;
Query OK, 0 rows affected, 2 warnings (0.02 sec)
mysql> alter table check_mb3_again_withvarchar modify name varchar(10) unique key;
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0
mysql> insert into check_mb3_again_withvarchar values('ankit');
Query OK, 1 row affected (0.01 sec)
mysql> insert into check_mb3_again_withvarchar values('ankit ');
ERROR 1062 (23000): Duplicate entry 'ankit ' for key 'check_mb3_again_withvarchar.name'

So this is expected that even if application or web, sends the same string with trailing space, it will be rejected. Now its time to migrate the table to 8 using mb4 collation.

mysql> alter table check_mb3_again_withvarchar charset utf8mb4;
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0
mysql> alter table check_mb3_again_withvarchar modify name varchar(10) CHARACTER SET utf8mb4 collate utf8mb4_0900_ai_ci;
Query OK, 1 row affected (0.02 sec)
Records: 1  Duplicates: 0  Warnings: 0
mysql> show create table check_mb3_again_withvarchar;
+-----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table                       | Create Table                                                                                                                                                                                                                     |
+-----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| check_mb3_again_withvarchar | CREATE TABLE `check_mb3_again_withvarchar` (
  `name` varchar(10) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
  UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+-----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

Ok, so we have migrated the table successfully, lets continue with our writes –

mysql> insert into check_mb3_again_withvarchar values('ankit ');
Query OK, 1 row affected (0.00 sec)
mysql> select * from check_mb3_again_withvarchar;
+--------+
| name   |
+--------+
| ankit  |
| ankit  |
+--------+
2 rows in set (0.00 sec)

OOPS !! so we fall in a trap wherein application didn’t realise and a duplicate record (of varchar data type) has been inserted successfully.

How to solve this ?

As I mentioned before that there are multiple collations being supported by one character set, we can select the one we need. By “need” I mean several factors like the type of character we need, do we need to compare the strings without spaces or with spaces.

Considering the specific case mentioned in this topic, if we are looking to opt utf8mb4 but with PAD SPACES, we can choose one from collation table. Also we can design our own collation which is not a part of this topic.



So, what is important ?

It is extremely important to understand the behaviour of your data and how does the write operations actually writing the data before planning your migration as such issues looks invisible in beginning but after few months or years, it mess up your data. So, yes deciding character set and collation is extremely important for your data.

Meet me here on linkedin.

Note – These are my own thoughts and doesnt represent the thoughts of my employer in anyway.


Redo flushing. MySQL VS Postgres and Redo in 8/9

One of my earlier post MySQL Innodb – Internal Transaction flow and its related video talks about a transaction flow which gives the internal concept of transaction flow in Innodb. In this article I am going to talk about how the Redo log ( WAL in PG ) flushing is different in MySQL and Postgres and…

About the future of system variables in MySQL

Yesterday I was working on performance schema to troubleshoot a replication issue and my eyes stopped on 2 new additional tables starting 8.0 onwards which I felt quite useful in some cases and these 2 tables are mentioned below. However, this is not the main topic of this article but I will cover a high level introduction of them.

persisted_variables
variables_info

What’s old ? You might be knowing already.


Those who doesn’t know about persisted variables, it let you persist the “modified global variables” even after a mysqld restart i.e. you doesn’t need to go to the my.cnf file and make changes in it. Lets try to understand with an example:

mysql> SET PERSIST max_connections = 200;

Query OK, 0 rows affected (0.01 sec)

mysql> show global variables like '%max_connection%';
+------------------------+-------+
| Variable_name          | Value |
+------------------------+-------+
| max_connections        | 200   |
| mysqlx_max_connections | 100   |
+------------------------+-------+

Checking the mysqld-auto.cnf (resides in the data directory having a json based format)

cat mysqld-auto.cnf

{"Version": 2, "mysql_dynamic_parse_early_variables": {"max_connections": {"Value": "200", "Metadata": {"Host": "localhost", "User": "root", "Timestamp": 1685521797329463}}}}

After this, if we restart our mysqld, you will notice that your changes are persisted now.

What about performance schema table I mentioned above ?

Talking about variables_info table, if we would like to see which user has changed the value of a variable and at what time so that we can dig down any issues like max_connection, purge_thread, buffer_pool size, auditing and so on. Here I must see this table :

mysql> select * from performance_Schema.variables_info where variable_name like '%max_connections%'\G;
*************************** 1. row ***************************
  VARIABLE_NAME: max_connections
VARIABLE_SOURCE: DYNAMIC
  VARIABLE_PATH: 
      MIN_VALUE: 1
      MAX_VALUE: 100000
       SET_TIME: 2023-05-31 14:18:46.893137
       SET_USER: root
       SET_HOST: localhost
*************************** 2. row ***************************
  VARIABLE_NAME: mysqlx_max_connections
VARIABLE_SOURCE: COMPILED
  VARIABLE_PATH: 
      MIN_VALUE: 1
      MAX_VALUE: 65535
       SET_TIME: NULL
       SET_USER: NULL
       SET_HOST: NULL
2 rows in set (0.01 sec)

What’s new in 8.0.29 ?

Coming down to our main topic. Starting 8.0.29 we will be able to secure sensitive variables which will be storing data like passwords, key rings etc and even apply the access level security on them. At this moment, there are no such variables but we should remember that to use such variables, we need to enable component keyring_file which will help us to secure these variables once they will be introduced. In this blog I have explained how to install components.

mysql> install component 'file://component_keyring_file';
Query OK, 0 rows affected (0.02 sec)

mysql> use mysql;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql>  select * from component;
+--------------+--------------------+-------------------------------+
| component_id | component_group_id | component_urn                 |
+--------------+--------------------+-------------------------------+
|            5 |                  1 | file://component_keyring_file |
+--------------+--------------------+-------------------------------+
1 row in set (0.00 sec)

mysql> show global variables like '%keyring%';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| keyring_operations | ON    |
+--------------------+-------+

Talking about how can we encrypt these variables and control its access, we have two options :

Are these useful ?

Once there will be some sensitive variable and if we are using it, we need to be extra cautious about them so that this data won’t be visible to any users. Additionally, master key ring needs to be rotated occasionally to make this more secure. Secondly, sensitive variables should be put in OFF state so that in case if data won’t be encrypted, mysqld wont restart. Still, it will be too early to comment on anything.

Other posts –

Components in MySQL 8

This article discuss about the less discussed topic of MySQL 8 . Read about components in mysql and how it is differ from plugins.

Leadership principles at simplifieDB

At simplifieDB, we follow below principles in our daily job routine.

  1. If you write your problems very clearly, its half solved.
  2. Be pellucid about your work. Keep digging down.
  3. Customer obsession is not greater than your family’s obsession. Make balance.
  4. Success of our customer is success of our business.
  5. Perfection is not always required. Be first.
  6. Enjoy your work and keep things simple.
  7. Don’t worry too much. We together will find out the solution.
  8. Always listen ideas/solutions of your colleague. Don’t just reject.
  9. Always share, what you learnt today. This will help you and everyone grow.
  10. Don’t disagree about a solution, just because of your ego. Remember the target, not ego.

Read our most popular blogs

Some very basics of LSM tree….

This article is in continuation of my previous article on my journey from relational to distributed database. Also I talked about probabilistic data structure here. While discussing distributed databases with one of my colleague, I realised that we often miss focusing on basics and then I decided to publish this article in most simplest form.…

Database Architecture that empowered CCTNS. Episode 1.

I am covering this article in multiple articles and will keep everyone short so that readers won’t feel bored.


Note – These thoughts are nowhere linked with any of my employers or not even being influenced by them or not even linked with any governments. It DOES NOT contain any confidential information. It is purely my own technical write-up. The only and only purpose of this article is for those who are keen to learn more about database architectures and the possible scenarios one can face while designing.

Before I start the technical discussion, I must tell you about the CCTNS. CCTNS stands for Crime and Criminal tracking network and system, which is an Indian government initiative back in 2009 by the Ministry of home affairs (MHA). Under this project, our main aim is to set up the database architecture for more than 15,000+ police stations across the country in such a way where we can achieve some of the below-mentioned points (but not limited to).

How and which database we opted for?

Back in 2014 end, there were not many DBAAS solutions, and databases like Oracle, and SQL servers were at their peak we really wanted to avoid them because of the expensive solutions they provide and rather preferred to choose open-source databases like MySQL and other open-source databases. With my experience, I have been asked to evaluate it, based on how efficiently and simply are they storing the data. During the evaluation, I have taken MySQL’s InnoDB storage engine (instead of MyISAM) because of its ACID nature ( although there are various debates about whether InnoDB supports ACID) since we were also dealing with the transactional data, granular level locking, foreign key constraints, etc.

While studying I realized that Innodb stores data in the Primary key format and the secondary index is also linked with the primary key only i.e. non-pk rows acting as the pointer to the primary keys and even if you are not creating any PK in your table, InnoDB itself creates 6 bytes of PK ( in the absence of unique key). Hence, it becomes quite easy to navigate to the specific page while reading the data since pages are using link lists to navigate to the adjacent or required page and also data is clustered with the primary key. I am not going into the details of the InnoDB storage architecture as it is out of topic. Moreover, it is quite easy to navigate within the data directory of MySQL since it was quite easy to understand. Considering these thoughts along with many other points after evaluating the other open source databases, we have decided to move with MySQL and the available version was MySQL 5.6.

The complexity of the database servers and the challenges we had

Since there were more than 15,000+ police stations across the nation and counting, we needed to set up a strong replicated environment. As all of these police stations were storing various FIRs and related updates to it at various intervals, we need to make ensure that :

1. The writes and updates won’t be affected by the spike in the number of connections.

2. Database should be available all of the time to accept connections.

3. Storage should not be a blocker since we were storing BLOB and text data extensively

4. Since the data needs to be replicated instantly and stale data on the read nodes should be avoided, we required a solution that can be satisfied by the combination of semi and async replication.

5. A warehouse server that can connect with all database servers and thus keep on replicating from it since this is required for various management purposes by the Government of India.


PS is a police station in every district and every district has multiple police stations ( and increasing ).
There are multiple districts in a city which further multiples the number of police stations.
There are multiple cities in a state (except union territory)
There are 28 states and 8 union territories in my beloved INDIA.

Here are the questions we had when we saw an abundance amount of data:

1. Whether to have one MySQL instance of all police stations belonging to a city or should we implement the sharding considering a unique police station number as the shard key. Please note that in MySQL, Schema is a synonym to the Database which is not in the case of databases like Oracle and one instance means one mysqld process.

2. If we go for the primary solution, then the overhead of maintaining multiple numbers of databases would be there. Although there is no hard limit on the maximum number of databases in MySQL unless there is a restriction at the OS level i.e. the file descriptors. Read it here.

3. If we go for the secondary one, then the overhead of maintaining servers which we really wanted to avoid since starting.

Let’s see what all comes into our mind :


In the next subsequent series, I will showcase the architecture in form of a diagram and how we overcome these challenges ( not limited to) to create such a large database Architecture. I will also cover what other limitations we had in MySQL 5.6 and how we overcome those without moving to 5.7 and how we did upgrades later on and most importantly how we made our replication more mature.

Latest blogs

Bloom Filters

Long ago, I presented a talk on MyRocks and its use cases. In this blog, I am discussing bloom filters in the nutshell and how it is been used. I must say that this blog has been influenced by my mentor Yves Trudeau who has already published a very good description of bloom filters.

Bloom filter is not related to only MyRocks/RockDB but it is also being used in databases like TiDB ( distributed database ), Cassandra, Redis and Oceanbase DB. In other words, it is being widely used in the databases which use the LSM tree concept i.e. log-structured merge tree. Bloom filters keeps data in the cache or in memory to avoid seeking data from the storage, To understand the bloom filters, we need to understand the following terms :

  1. Probabilistic data structure.
  2. Space efficient.
  3. The hash function and its types.
  4. Bit array

Probabilisitc data structure – It means that data may or may not reside in the storage/set. In other words there is no guarantee of the required data existence. Take a use case wherein you have ample amount of data and you need to find a particular record. Searching for this particular record can take some time but if we have a case wherein we can have the probability of finding the required data, we can save the time and have assumption that we are in correct direction.

Hash functionHashing is a concept which transforms your random size of input (string for example) into a fixed size of output. Now, these hash values has various use cases. One of the use case is in security which we also call as encryption. Additional use case is to search for the data. If I talk about MySQL, I have various hash functions. For example md5,sha256.

Bit array – As name suggest, it is an array of bits i.e. 0/1. So if I say, I need the array of 8 bits, then I mean the combination of 0 and 1. For bloom filter, after calculating the hash of some input, we need to have its decimal representation of that output so that we can mark its entry in the array. For example, 24 and then we will mark the index entry of 24 as 1 in the bit array. We will see further how bloom filter actually do this.

00000000
76543210
array of 8 bits with their index value in bottom,

Space efficient – It is a method to use minimum amount of memory to store the data/key/sets. Since bloom filter uses the bits array or bitmaps, we can say it is a space efficient method.

How bloom filter works ?

In its very basic, bloom filters create a hash key of the element/string, transform it into the decimal representation and then add it into the array.
For an example, take a string “contra” and try to create its hash using any of the available function. Here I have used md5

mysql> select md5('contra');
+----------------------------------+
| md5('contra')                    |
+----------------------------------+
| a83f0f76c2afad4f5d7260824430b798 |
+----------------------------------+
1 row in set (0.00 sec)

In this example, I will create the bloom filter of size 32 bits i.e. 4 bytes. Lets try to transform it using mathematical operators :

mysql> select mid('a83f0f76c2afad4f5d7260824430b798',4,10);
+----------------------------------------------+
| mid('a83f0f76c2afad4f5d7260824430b798',4,10) |
+----------------------------------------------+
| f0f76c2afa                                   |
+----------------------------------------------+
1 row in set (0.00 sec)
mysql> select conv('f0f76c2afa',16,10);
+--------------------------+
| conv('f0f76c2afa',16,10) |
+--------------------------+
| 1034943212282            |
+--------------------------+
1 row in set (0.00 sec)
mysql> select mod(1034943212282,32);
+-----------------------+
| mod(1034943212282,32) |
+-----------------------+
|                    26 |
+-----------------------+
1 row in set (0.00 sec)
mysql> 

Set the 26th index entry in bitmap as 1.


How bloom filter helps in locating any entry ?

In case of read, if client needs to read this entry, database will first check use the bloom filter, by hashing the “contra“, and checking if 26 value is 1.

In other case, it is also possible that some other key also has same bit set to 1 i.e. 26th. In this case, if client is looking for contra and we haven’t included the contra in the set then still we will be notified that the data exist but when searched, don’t wont exist. Hence this will be called as false positive.


How can we make bloom filter useful ?

In my example, I have used 4 bytes and set only one bit. To make bloom filters more suitable to use, we can set 2 bits instead of only one and can increase the size of bllom filter from 4 to 8 bytes. But there are caveats, behind increasing the size of bloom filters which I am not covering in this article.

How different database products uses bloom filters ?

Talking about rockdb, it create bloom filter for every SST file . More can be read here.
Talking about redis, we can add keys into the bloom filters.

This blog is not covering how bloom filters works in the respective database.

Conclusion :
Bloom filter is an important concept in the LSM tree based databases and can be useful to quickly find the respective keys. In this article I have tried to cover the concept of bloom filters. However, one should use it carefully and allocating the large memory won’t always help.

Disclaimer

This is my personal blog and none of my employer is involved in it. Hence thoughts are my own.

Other articles :

Hash partitioning and its role in distributed systems – 1

Have you ever divided a single physical book into rather sub-books because the book was too heavy and it is not even easy to quickly navigate to a specific page?

Similarly, partitioning a table is like you are dividing one physical table into various sub-tables (on the same storage and same data directory) to have better storage management, better performance (in some cases), and better archival policies. We can partition a table in various ways. The process is, first we have to choose a partitioning key or partitioning keyS i.e. a composite partition key in which multiple columns are involved. By partitioning key I mean we need to choose a column/columns basis on which we can divide the table.

Let’s say we have the below popular table employees_range. This table is divided into two sub-parts i.e. p0 and p1. If you insert a record with an id less than 100, that particular record will insert in the p0 partition. This is the simplest form of partitioning we can have in our database server and is known as range partitioning. However such type of partitioning has already been covered by multiple articles and is not a topic to be covered under this article.

Whenever you are creating a partition, table structure matters a lot. For example, if you have a primary key and a unique key in the table and you are planning to partition your table, then the partition key should be present in all the keys i.e. either a unique key or as a primary key.

This blog is focused on hash partitioning. If you will understand this, you can also replicate your understanding in the distributed systems where database is sharded into several shards and you can also understand that how data is being written into the respective shards.

Hash partition :

create table t1 (col1 int not null,col2 date not null,col3 int not null,col4 int not null,unique key(col1,col2,col3)) partition by hash(col3) partitions 4;

This will divide the table into 6 different sub tables.Numeric 6 means the number of partitions we need. In hash partitioning we doesn’t define the path of data explicitly (by path of data I mean, in which partition, a particular insertion shall take place) as we do in the range partitioning. Hash is a function in computer science/math which transforms an arbitrary string into a fixed size of output.

Now hashing can be of different types.

MySQL, by default uses modulous hashing (linear hashing is different) i.e. x % y where x is the value of partitioning key and y is the number of partition . For example 9 % 5 which means divide 9 by 5 will give remainder as 4 so that will be the hashing. Similarly in above example hashing will be based on the col3 values. Lets try to insert some value

mysql> insert into t1 values (1,NOW(),300,1);
Query OK, 1 row affected, 1 warning (0.00 sec)
mysql> insert into t1 values (1,NOW(),191,1);
Query OK, 1 row affected, 1 warning (0.01 sec)

mysql> insert into t1 values (1,NOW(),0,1);
Query OK, 1 row affected, 1 warning (0.00 sec)



mysql> select * from t1 partition(p3);
+------+------------+------+------+

| col1 | col2       | col3 | col4 |

+------+------------+------+------+

|    1 | 2024-04-01 |  191 |    1 |

+------+------------+------+------+

1 row in set (0.00 sec)

mysql> select * from t1 partition(p0);
+------+------------+------+------+

| col1 | col2       | col3 | col4 |

+------+------------+------+------+

|    1 | 2024-04-01 |    0 |    1 |

|    1 | 2024-04-01 |  300 |    1 |

+------+------------+------+------+
2 rows in set (0.00 sec)


300/4 = 0 , so that means data will insert in the first partition.
191/4=3, this will go in 3rd partition
0/4=0, this will go in first partition.

This is the most simplest form of partitioning you will encounter. Same concept can also be applied on multi-tenant databases where databases or tables has been sharded into different servers. See this diagram below


Choosing the definition of hashing can surely impact the write speed of query. If hashing is complex, it will affect the performance. Lets try to use some complex hashing algorithm in the next section.

Latest Articles

Sys Schema MySQL 5.7+

2nd post is here While working on Performance schema for my customers, I have realised the importance of sys schema which is rarely being used. In this blog, I will cover 2 articles on “SYS” schema. SYS schema is  used to monitor your database performance and facilitates easier way to use instruments and consumer. You…

MySQL Instrument 2

In Introduction to Instruments I have covered what instruments are (which is a part of performance schema), how they look like and where can you find it in i.e. setup_instruments table of performance_schema. I have also covered how a DBA can make use of it . There are 1016 set of instrument in MySQL 5.7.17…

Instruments in MySQL

Update – “setup_timers” table has been depreciated and is removed in MySQL 8.0. This blog is covering the details for 5.7/8.0/9.4. There might be some changes that reader can expect in 8.0 and 9.4 Blog starts from this point – This is a two article series on Instrument in MySQL. This is the first one…

Mutex & MetaData lock handling with Performance and Sys Schema in MySQL

Thought of writing this blog came in my mind when I was struggling with high CPU and waiting of queries in my live production environment of MySQL DB . One case which I can recall easily is of a night when physical backup of my database started using MEB/PEB (MySQL enterprise backup and Percona xtrabackup) which takes the exclusive lock in the initial phase and last phase of backup for a defined amount of time. During this time all the queries goes in a waiting state and when I checked in the show process-list, it says “waiting for meta data lock”. One more case which i can recall is the contention among various threads while writing in to the binary cache and binary log. Such locks are called as mutex lock and can be catch from the various tables of performance schema in MySQL. Imagine a real situation when your live database is running with high number of connections and your load is CPU bound and you met with such a situation where your queries are being slowed down due to the locks already being acquired by other transactions and you want to make sure that the business should not be impacted anyhow, understanding these types of locks will help you to eliminate the weird situation you can be in.

In a highly concurrent environment where threads are competing with each other to gain resource access, locking needs to be properly implemented on all the databases in an instance and objects including table, data, cache, buffers, files etc. In MySQL locks can be at table level as well as at row level locks (multi-granular locking) and at the memory level. These locks needs to be implemented in order to maintain data consistency and to reduce the contentions. Now some locks can be user managed (like shared and exclusive locks) and some are system managed. In system managed we have mutual exclusion or say mutex .

Mutex also works as a exclusive lock so that one thread can access a particular memory resource or a file (for example binary logs) at a time so that other threads which are requesting for same resource doesn’t interrupt its task and once thread complete its task it will release the lock and other thread/s which is/are waiting in queue will take over. Hence main thread will acquire mutex first and then it performs the required task and release the lock. One of the example of mutex is binary log. In case of syn_binlog > 1 , there is always one leader thread which is responsible for flushing the bin log cache to disk after the n number of group commit and other threads are follower. Leader thread will take the mutex lock and release it once data get flushed on the disk.

mysql_mutex_lock(&LOCK_log);

    DEBUG_SYNC(leader->thd, "commit_after_get_LOCK_log");
    mysql_mutex_lock(&LOCK_group_commit_queue);
    current= group_commit_queue;
    group_commit_queue= NULL;
    mysql_mutex_unlock(&LOCK_group_commit_queue);

//// I have taken this snippet of mutex from Percona mysql code from git hub.

show engine innodb mutex;

I have copied this line from mysql document page i.e.

“SHOW ENGINE INNODB MUTEX displays InnoDB mutex and rw-lock statistics”

Sample output showing all rwlock . For mutex lock only mutex will be shown.

Apart from this, mutex status can be monitored from the Performance schema as well. In later section you will see which INSTRUMENT is responsible for enabling mutex and which table is responsible for mutex monitoring.

for (ulint i = 0; i < n_sync_obj; i++) {
        rw_lock_create(hash_table_locks_key, table->sync_obj.rw_locks + i,
                       level);
      }
////  This is a snippet of hash0hash.cc where rwlock is created. 

/// In mysql doc it has been mentioned that for mutex it will report only Mutex name but for rwlock it will mention file name as well as line number. I believe they should mention same for mutex as well as it will really help alot.

For mutex we have 223 instruments for different purpose and corresponding table to check for mutex lock is performance_schema.mutex_instances.

Related instruments for mutex can be found from setup_instrument table with wait/synch/mutex/ and can be enabled dynamically. I am trying to re-produce the mutex locking scenario which I will add once I have some data. For now it can be understand that table “events_waits_current” in performance can help in analysing the mutex lock.

Meta-Data locks

What is metadata lock ?

Metadata locks helps us in acquiring a lock on table structure so that if a thread is in some transaction (i.e. begin and end transaction) and if there is another thread waiting for same resource to alter the structure of the table then the requesting thread will come into the meta data lock mode. Such scenarios can really hamper the database performance in a production environment. Think of a scenario where your application has started a transaction :

Session 1
Session 2
SESSION 3 ( from show processlist;)

With the help of show processlist it is visible that the requesting thread is waiting for the meta data lock to be released from the other thread. But problem is it is not easy to locate which thread is holding the lock ,not even SHOW ENGINE INNODB STATUS really help. But Performance schema and sys schema will be helpful here . Let us see how :

PSI_stage_info MDL_key::m_namespace_to_wait_state_name[NAMESPACE_END] = {
    {0, "Waiting for global read lock", 0, PSI_DOCUMENT_ME},
    {0, "Waiting for backup lock", 0, PSI_DOCUMENT_ME},
    {0, "Waiting for tablespace metadata lock", 0, PSI_DOCUMENT_ME},
    {0, "Waiting for schema metadata lock", 0, PSI_DOCUMENT_ME},
    {0, "Waiting for table metadata lock", 0, PSI_DOCUMENT_ME},
    {0, "Waiting for stored function metadata lock", 0, PSI_DOCUMENT_ME},
    {0, "Waiting for stored procedure metadata lock", 0, PSI_DOCUMENT_ME},
    {0, "Waiting for trigger metadata lock", 0, PSI_DOCUMENT_ME},
    {0, "Waiting for event metadata lock", 0, PSI_DOCUMENT_ME},
    {0, "Waiting for commit lock", 0, PSI_DOCUMENT_ME},
    {0, "User lock", 0, PSI_DOCUMENT_ME}, /* Be compatible with old status. */
    {0, "Waiting for locking service lock", 0, PSI_DOCUMENT_ME},
    {0, "Waiting for spatial reference system lock", 0, PSI_DOCUMENT_ME},
    {0, "Waiting for acl cache lock", 0, PSI_DOCUMENT_ME},
    {0, "Waiting for column statistics lock", 0, PSI_DOCUMENT_ME},
    {0, "Waiting for resource groups metadata lock", 0, PSI_DOCUMENT_ME},
    {0, "Waiting for foreign key metadata lock", 0, PSI_DOCUMENT_ME},
    {0, "Waiting for check constraint metadata lock", 0, PSI_DOCUMENT_ME}};


//// These are the metadata lock threads states which can be find in mdl.cc

Corresponding instrument for meta data locks for waiting queries which is enabled by default under setup_instrument table is:

wait/lock/metadata/sql/mdl

There are other instrument as well for metadata lock and corresponding table for metadata lock is performance_schema.metadata_locks

We can find the thread which has acquired the metadata lock for above transaction:

SELECT * FROM metadata_locks;
will only show the real or current rows and will be purged out when resolved.

column lock_status tells whether lock has been granted or in pending state. OWNER_THREAD_ID 97 is in PENDING STATE and the corresponding lock is with OWNER_THREAD_ID 80 which is holding SHARED_READ lock on database ankit and table t1_metadata. Taking the advantage of thread table in performance_schema can further be helpful to deep dive by selecting the rows with thread 80 and 97.

if you see here : thread_id 97 with process_id 33 (connection id from processlist ) is waiting for meta data lock from thread id 80

Hence 80 is the thread which is holding the lock since 474 seconds and corresponding thread is waiting for 456 seconds. Even to have more deep dive it is beneficial to use sys.session

output from sys.session

Combining all the above information in sys.schema_table_lock_waits can finally provide the solution to overcome this locking :

schema_table_lock_waits can tell us what action to be taken in order to resolve above meta data issue by referring the column sql_kill_blocking_query.


Some basic concepts and variables of MySQL should be known while handling this in real environment:

1. lock_wait_timeout
2. Type of isolation
3. query_time

Understanding how lock works in mysql is a key concept behind database performance.

Read latest articles

MySQL Innodb – Internal Transaction flow

The below picture is just a representation of the flow. I created this figure by referring to the picture in percona doc but I don’t follow this picture anymore (I didn’t mean It was wrong) and have created my own one now. You can refer to my video I posted link in next paragraph I…

Function-based indexing in MySQL 5.7

I have rarely seen dev/DBA using this concept when their query is busy in FTS or index scanning. I have decided to write my own experience here and share it with the outer world. It might help someone who is struggling with their query performance. In many cases, you have experienced slowness in your select…