What is Indexing in Databases?
Indexing is a technique used in databases to speed up the retrieval of rows from a table. It works similarly to an index in a book — instead of scanning every page (or every row), the database can quickly jump to the right location using the index.
Why is Indexing Important?
In large systems, tables can grow to millions of rows. Without indexing, searching or filtering data using WHERE clauses would require a full scan of the table, making queries very slow. Indexes help in improving the performance of:
- Search operations using WHERE clauses
- Sorting operations (ORDER BY)
- Join operations between tables
Example: Indexing in a Users Table
Imagine we have a users
table with 10 million records. We often search for users by their email address:
SELECT * FROM users WHERE email = 'alice@example.com';
Without an index on the email
column, the database scans all 10 million rows — a slow and costly operation.
By creating an index:
CREATE INDEX idx_users_email ON users(email);
Now, the database can quickly jump to the row matching the email, reducing search time drastically.
💡 Question: Does indexing always improve performance?
Answer: No. While indexes speed up read operations, they add overhead during insert, update, and delete operations because the index also needs to be updated. You should only index columns that are frequently searched or sorted.
Types of Indexes
- Single-column Index: Indexes one column (e.g., email).
- Composite Index: Indexes multiple columns (e.g., first_name and last_name).
- Unique Index: Ensures all values in the column are unique (e.g., username).
- Full-text Index: Useful for search features like article keywords or blog titles.
When Not to Use Indexes
Don’t add indexes to every column. Avoid indexing:
- Columns that are rarely queried
- Columns with high update frequency
- Columns with low cardinality (e.g., gender: 'M'/'F')
What is Denormalization?
Denormalization is the process of introducing redundancy into a database by combining tables or storing duplicated data to improve read performance.
In normalized databases, data is split across multiple tables to remove redundancy (e.g., separate tables for users and orders).
Example: Normalized vs Denormalized Design
In a normalized model:
Table: users
- user_id
- name
Table: orders
- order_id
- user_id
- order_amount
To get a user's name and their orders:
SELECT users.name, orders.order_amount
FROM users
JOIN orders ON users.user_id = orders.user_id;
This join operation is efficient with small datasets, but in high-traffic systems, frequent joins become a performance bottleneck.
In a denormalized model, we may store user names directly in the orders table:
Table: orders
- order_id
- user_id
- user_name
- order_amount
Now we don’t need to perform a join to get the user name for an order — we trade storage space for speed.
💡 Question: Isn't storing the same data in multiple places risky?
Answer: Yes. Denormalization leads to data duplication. If a user's name changes, we have to update it in multiple places. However, for read-heavy systems, the performance benefits often outweigh the complexity of maintaining consistency.
When to Denormalize
- Read-heavy applications (e.g., social networks, reporting dashboards)
- Systems where joins are becoming a bottleneck
- Scenarios where data consistency can be managed or is less critical
Real-World Analogy
Think of a library database:
- Normalized: Book table stores author_id, and a separate Author table stores author details. Good for clean storage.
- Denormalized: Book table stores full author name. Faster retrieval without joins.
💡 Question: Should I denormalize all my data for speed?
Answer: No. Denormalization should be a deliberate choice after identifying performance issues in normalized models. Not all systems benefit from denormalization.
Key Takeaways
- Indexing is used to improve the speed of data retrieval in databases.
- Use indexes selectively — they come with trade-offs during writes.
- Denormalization reduces joins and improves read performance at the cost of data redundancy.
- Both indexing and denormalization are performance tuning tools used in system design.