⬅ Previous Topic
Sharding and Partitioning in System DesignNext Topic ⮕
CAP Theorem Explained for Beginners⬅ Previous Topic
Sharding and Partitioning in System DesignNext Topic ⮕
CAP Theorem Explained for BeginnersIndexing is a technique used in databases to speed up the retrieval of rows from a table. It works similarly to an index in a book — instead of scanning every page (or every row), the database can quickly jump to the right location using the index.
In large systems, tables can grow to millions of rows. Without indexing, searching or filtering data using WHERE clauses would require a full scan of the table, making queries very slow. Indexes help in improving the performance of:
Imagine we have a users
table with 10 million records. We often search for users by their email address:
SELECT * FROM users WHERE email = 'alice@example.com';
Without an index on the email
column, the database scans all 10 million rows — a slow and costly operation.
By creating an index:
CREATE INDEX idx_users_email ON users(email);
Now, the database can quickly jump to the row matching the email, reducing search time drastically.
Answer: No. While indexes speed up read operations, they add overhead during insert, update, and delete operations because the index also needs to be updated. You should only index columns that are frequently searched or sorted.
Don’t add indexes to every column. Avoid indexing:
Denormalization is the process of introducing redundancy into a database by combining tables or storing duplicated data to improve read performance.
In normalized databases, data is split across multiple tables to remove redundancy (e.g., separate tables for users and orders).
In a normalized model:
Table: users
- user_id
- name
Table: orders
- order_id
- user_id
- order_amount
To get a user's name and their orders:
SELECT users.name, orders.order_amount
FROM users
JOIN orders ON users.user_id = orders.user_id;
This join operation is efficient with small datasets, but in high-traffic systems, frequent joins become a performance bottleneck.
In a denormalized model, we may store user names directly in the orders table:
Table: orders
- order_id
- user_id
- user_name
- order_amount
Now we don’t need to perform a join to get the user name for an order — we trade storage space for speed.
Answer: Yes. Denormalization leads to data duplication. If a user's name changes, we have to update it in multiple places. However, for read-heavy systems, the performance benefits often outweigh the complexity of maintaining consistency.
Think of a library database:
Answer: No. Denormalization should be a deliberate choice after identifying performance issues in normalized models. Not all systems benefit from denormalization.
⬅ Previous Topic
Sharding and Partitioning in System DesignNext Topic ⮕
CAP Theorem Explained for BeginnersYou can support this website with a contribution of your choice.
When making a contribution, mention your name, and programguru.org in the message. Your name shall be displayed in the sponsors list.