Function-based indexing in MySQL 5.7

I have rarely seen dev/DBA using this concept when their query is busy in FTS or index scanning. I have decided to write my own experience here and share it with the outer world. It might help someone who is struggling with their query performance.

In many cases, you have experienced slowness in your select queries even instead after having had an index on the column present in where clause and you are using an in-built function like substring, like, max, min, etc. Please note I’m talking about inbuilt functions only not user-defined functions.

show create table list:

Create table (id int, name char (20)),primary key (Id) ,Key idx_name(name) engine=Innodb;

Now insert at least 1 million records so that we can easily understand the performance impact :

Now if your query is like :

select id,name from list where id=123 and name =’oggy’;

This query will use index idx_name and will give you output in less than a second.

But if your query is like ;

select id, name from list where id=123 and name like ‘%abc%’;

With more than one million records this query will perform FTS (full table scanning) on the name column and will not use the index in spite name column has a secondary index on it and this may take a lot of time depending on the number of records in the table.

Fact is function never use indexes, even if involved column has index created on it and always result in FTS.

Prior to MySQL 5.7, MySQL doesn’t provide any functional-based indexing feature or something similar to it.

In 5.7, we have virtual columns feature which is known as generated columns and can be built at the time of reading data, thus resulting in no consumption of storage space.

Let’s begin with our earlier example where our query was doing FTS. Now if i create a virtual column :

alter table list add name_virt char (10) as substring(name,3,9);

So instead of evaluating at the application level, DB will take care of this. Moreover, we can create an index on these virtual columns. So, if I create an index on this newly created column( I am trying making use of the online DDL, you can choose your own) :

create index idx_name_virt on list(name_virt) algorithm=inplace lock=none;

Once the index has been added, please run the same select query and you will observe that instead of consuming more than 4-5 minutes (suppose), your query will yield a result in less than a second.

If you want to store these values in the table, then you can use the STORAGE keyword while adding a column. But by default, it remains virtual. In my experience, my queries that were taking more than 4 seconds are now taking millisecond only.

Please be informed that performance depends on how heavy evaluation we are doing. If we are doing heavy evaluation then performance may get impacted because of several IO. So we need to decide as per our requirements.

Below is the clause of virtual column from mysql docs:

col_name data_type [GENERATED ALWAYS] AS (expression) [VIRTUAL | STORED] [NOT NULL | NULL] [UNIQUE [KEY]] [[PRIMARY] KEY] [COMMENT string‘]

About Ankit Kapoor

Ankit Kapoor is a database geek and enjoy playing with open source databases like MySQL and PG.
Connect with me on linkedin

Is MySQL ACID compliant ?

I understand it’s a bit of an old topic but people still have doubts. However, I have published an answer on this topic here, but now I feel to make my answer more connected with people with more improved thought.

There is already a lot of debate on this whether MySQL completely follows ACID properties, and everyone has their own opinion. MySQL docs also don’t convey complete information except for the glossary here.

MySQL with InnoDB engines closely follows the ACID property.

Let me try to put all properties one by one how MySQL with Innodb follows these properties :

  1. Atomicity says that either rollback or commit in case any part of a transaction fails. As per my experience, InnoDB engines do crash recovery in case of OS reboot, power failure, etc or it rolls back whenever any crash occurs in mid of transaction. So in case, if any uncommitted transaction got failed due to any reason, I.e. not flushed (reached) to disk yet, InnoDB roll back the transaction . Hence I can say it is ATOMIC. Obviously, the redo log plays an important role here in re-executing the failed transactions.
  2. Consistency says that the data must always remain in a consistent state. DB must behave as per requirement. In simpler words only required data is allowed to insert and must complete. Also, the various constraints must be followed and should not get changed during the course of a transaction. InnoDB has all the constraints like Foreign keys, primary key etc. and doesn’t allow unauthorized values to be inserted.
  3. Isolation says that integrity should be maintained and data should not come in inconsistency due to concurrent transaction. .Innodb provides several row level locking which avoids (if properly handled) any other process to acquire lock on resource which is already in use by other process. Innodb follow all isolation i.e. serialized, repeatable read,read committed and read uncommitted. So we can set it as per our requirement.
  4. Durability : As per MySQL DOC the durability aspect of the ACID model involves MySQL software features interacting with your particular hardware configuration. Because of the many possibilities depending on the capabilities of your CPU, network, and storage devices, this aspect is the most complicated to provide concrete guidelines for. (And those guidelines might take the form of buy “new hardware”.) . For more clarification, innodb maintains double write buffers which ensures that in case of crash , data will not be lost and can be recovered from it after DB comes up.

Hope this help. Post your comment for any queries.

About Ankit Kapoor

Ankit Kapoor is a database geek and enjoy playing with open source databases like MySQL and PG.
Connect with me on linkedin

How to make create index and drop index faster with heavy table without locking

Previously ,whenever we needed to create index or alter a heavy Innodb table’s definition in production , we always ended up with lot of time wastage and down time. This is because MySQL usually acquire exclusive lock on this table till alteration get completed and don’t permit other DML operation to execute on same table.Also while doing this mysql do “copying to tmp table” operation which is really lot of time consuming and night mare for all of us because it is not possible to execute same in production.

However , prior to MySQL 5.6 we have plugins to achieve same but a overhead to activate it.

With current version , we have capability to avoid “copy to tmp table” operation using arguments like ALGORITHM and we can avoid locking using LOCK=NONE.

So now we can use:

CREATE INDEX index_name ON table_name (column_name) ALGORITHM=IN PLACE LOCK=NONE;

Let me explain you. Today I have created a b tree index on innodb table with size on disk 15 GB and have record size 30 millions.

On simple alter or create index/drop index , index creation/drop was taking 18-20 minutes and also blocking other DML operation to execute .

After supplying above mentioned argument create index took 2 min 23 seconds and for drop , it took 0.11 seconds. It was a huge difference.

This is because only meta data of this table has been changed and is accessible by other DML operation too.

Let me know , if it is useful for you with time stats and cpu stats.

Regards

Ankit