System Design CourseSystem Design Course1

Indexing and Denormalization



What is Indexing in Databases?

Indexing is a technique used in databases to speed up the retrieval of rows from a table. It works similarly to an index in a book — instead of scanning every page (or every row), the database can quickly jump to the right location using the index.

Why is Indexing Important?

In large systems, tables can grow to millions of rows. Without indexing, searching or filtering data using WHERE clauses would require a full scan of the table, making queries very slow. Indexes help in improving the performance of:

Example: Indexing in a Users Table

Imagine we have a users table with 10 million records. We often search for users by their email address:

SELECT * FROM users WHERE email = 'alice@example.com';

Without an index on the email column, the database scans all 10 million rows — a slow and costly operation.

By creating an index:

CREATE INDEX idx_users_email ON users(email);

Now, the database can quickly jump to the row matching the email, reducing search time drastically.

💡 Question: Does indexing always improve performance?

Answer: No. While indexes speed up read operations, they add overhead during insert, update, and delete operations because the index also needs to be updated. You should only index columns that are frequently searched or sorted.

Types of Indexes

When Not to Use Indexes

Don’t add indexes to every column. Avoid indexing:

What is Denormalization?

Denormalization is the process of introducing redundancy into a database by combining tables or storing duplicated data to improve read performance.

In normalized databases, data is split across multiple tables to remove redundancy (e.g., separate tables for users and orders).

Example: Normalized vs Denormalized Design

In a normalized model:

Table: users
- user_id
- name

Table: orders
- order_id
- user_id
- order_amount

To get a user's name and their orders:

SELECT users.name, orders.order_amount 
FROM users 
JOIN orders ON users.user_id = orders.user_id;

This join operation is efficient with small datasets, but in high-traffic systems, frequent joins become a performance bottleneck.

In a denormalized model, we may store user names directly in the orders table:

Table: orders
- order_id
- user_id
- user_name
- order_amount

Now we don’t need to perform a join to get the user name for an order — we trade storage space for speed.

💡 Question: Isn't storing the same data in multiple places risky?

Answer: Yes. Denormalization leads to data duplication. If a user's name changes, we have to update it in multiple places. However, for read-heavy systems, the performance benefits often outweigh the complexity of maintaining consistency.

When to Denormalize

Real-World Analogy

Think of a library database:

💡 Question: Should I denormalize all my data for speed?

Answer: No. Denormalization should be a deliberate choice after identifying performance issues in normalized models. Not all systems benefit from denormalization.

Key Takeaways



Welcome to ProgramGuru

Sign up to start your journey with us

Support ProgramGuru.org

Mention your name, and programguru.org in the message. Your name shall be displayed in the sponsers list.

PayPal

UPI

PhonePe QR

MALLIKARJUNA M