Database Architecture that empowered CCTNS. Episode 1.

I am covering this article in multiple articles and will keep everyone short so that readers won’t feel bored.


Note – These thoughts are nowhere linked with any of my employers or not even being influenced by them or not even linked with any governments. It DOES NOT contain any confidential information. It is purely my own technical write-up. The only and only purpose of this article is for those who are keen to learn more about database architectures and the possible scenarios one can face while designing.

Before I start the technical discussion, I must tell you about the CCTNS. CCTNS stands for Crime and Criminal tracking network and system, which is an Indian government initiative back in 2009 by the Ministry of home affairs (MHA). Under this project, our main aim is to set up the database architecture for more than 15,000+ police stations across the country in such a way where we can achieve some of the below-mentioned points (but not limited to).

How and which database we opted for?

Back in 2014 end, there were not many DBAAS solutions, and databases like Oracle, and SQL servers were at their peak we really wanted to avoid them because of the expensive solutions they provide and rather preferred to choose open-source databases like MySQL and other open-source databases. With my experience, I have been asked to evaluate it, based on how efficiently and simply are they storing the data. During the evaluation, I have taken MySQL’s InnoDB storage engine (instead of MyISAM) because of its ACID nature ( although there are various debates about whether InnoDB supports ACID) since we were also dealing with the transactional data, granular level locking, foreign key constraints, etc.

While studying I realized that Innodb stores data in the Primary key format and the secondary index is also linked with the primary key only i.e. non-pk rows acting as the pointer to the primary keys and even if you are not creating any PK in your table, InnoDB itself creates 6 bytes of PK ( in the absence of unique key). Hence, it becomes quite easy to navigate to the specific page while reading the data since pages are using link lists to navigate to the adjacent or required page and also data is clustered with the primary key. I am not going into the details of the InnoDB storage architecture as it is out of topic. Moreover, it is quite easy to navigate within the data directory of MySQL since it was quite easy to understand. Considering these thoughts along with many other points after evaluating the other open source databases, we have decided to move with MySQL and the available version was MySQL 5.6.

The complexity of the database servers and the challenges we had

Since there were more than 15,000+ police stations across the nation and counting, we needed to set up a strong replicated environment. As all of these police stations were storing various FIRs and related updates to it at various intervals, we need to make ensure that :

1. The writes and updates won’t be affected by the spike in the number of connections.

2. Database should be available all of the time to accept connections.

3. Storage should not be a blocker since we were storing BLOB and text data extensively

4. Since the data needs to be replicated instantly and stale data on the read nodes should be avoided, we required a solution that can be satisfied by the combination of semi and async replication.

5. A warehouse server that can connect with all database servers and thus keep on replicating from it since this is required for various management purposes by the Government of India.


PS is a police station in every district and every district has multiple police stations ( and increasing ).
There are multiple districts in a city which further multiples the number of police stations.
There are multiple cities in a state (except union territory)
There are 28 states and 8 union territories in my beloved INDIA.

Here are the questions we had when we saw an abundance amount of data:

1. Whether to have one MySQL instance of all police stations belonging to a city or should we implement the sharding considering a unique police station number as the shard key. Please note that in MySQL, Schema is a synonym to the Database which is not in the case of databases like Oracle and one instance means one mysqld process.

2. If we go for the primary solution, then the overhead of maintaining multiple numbers of databases would be there. Although there is no hard limit on the maximum number of databases in MySQL unless there is a restriction at the OS level i.e. the file descriptors. Read it here.

3. If we go for the secondary one, then the overhead of maintaining servers which we really wanted to avoid since starting.

Let’s see what all comes into our mind :


In the next subsequent series, I will showcase the architecture in form of a diagram and how we overcome these challenges ( not limited to) to create such a large database Architecture. I will also cover what other limitations we had in MySQL 5.6 and how we overcome those without moving to 5.7 and how we did upgrades later on and most importantly how we made our replication more mature.

Latest blogs

Some very basics of LSM tree….

This article is in continuation of my previous article on my journey from relational to distributed database. Also I talked about probabilistic data structure here. While discussing distributed databases with one of my colleague, I realised that we often miss focusing on basics and then I decided to publish this article in most simplest form.…

Keep reading