Database Index (Secondary Key)

Database indexes can help to fasten up the database query.

Anology

Consider a book of 1000 pages with 10 chapters, each chapter 100 pages.

If you wanna find the word "Alchemist" — without index page, you would have to scan through the entire book/chapters i.e 1000 pages

With an index page you know where to go. To look up for a chapter you look at the index page and jump to that page.

However, on top of 1000 pages, you need around 10 pages to show the index, similar to how database index works.

Technical

When assigning more index to our database, we essentially are creating another sorted database just contain the key. Note that our database is essentially just disk storage.

Considering as below:

Field name       Data type      Size on disk
id (Primary key) Unsigned INT   4 bytes
firstName        Char(50)       50 bytes
lastName         Char(50)       50 bytes
emailAddress     Char(100)      100 bytes

Supposed we have these following fields and our query are mainly findById() and findByFirstName().

For findById it's easy because it's incremental, we can simply perform Binary Search with $O(logn)$ time complexity.

However for findByFirstName we need to traverse all the blocks in order to find the record that we need. Imagine we have 5,000,000 records that stored in 1,000,000 disk blocks (1 disk block contains 5 records)

Assigning an index

Now if we assign an index to firstName, we essentially creating another table with the follow values

Field name       Data type      Size on disk
firstName        Char(50)       50 bytes
(record pointer) Special        4 bytes

The record pointer points to the row of the original table.

This table will have firstName sorted. As a result, we can perform Binary Search again and reduce to $O(logn)$.

However, the database will need to create 1 more table and it takes up more disk space. As a result too many indices is not good.

[!note]
Creating an index will affect write because we need to build the B+ tree.

You can create multiple indices per database