Introduction
A URL shortener is a service that converts long URLs into shorter, more manageable links. For example, a long link like:
https://www.example.com/articles/designing-scalable-systems-for-beginners
can be shortened to something like:
https://short.ly/abc123
Real-World Use Case
Services like Bit.ly, TinyURL, and Google’s former goo.gl are popular examples. These tools are used extensively in social media, messaging apps, and email campaigns where space and aesthetics matter.
High-Level Requirements
Functional Requirements
- Given a long URL, generate a unique short URL.
- Redirect users from short URL to the original long URL.
- Track usage statistics (click count, timestamps).
Non-Functional Requirements
- High availability
- Low latency redirection
- Scalability to billions of URLs
Basic Workflow
- User submits a long URL.
- System generates a unique short code (like
abc123
). - Short URL is created:
short.ly/abc123
. - When accessed, the system maps the short code to the long URL and redirects the user.
How Do We Generate the Short URL?
We need a way to convert long URLs into short, unique codes. There are a few approaches:
Approach 1: Hashing
We can apply a hashing algorithm (like MD5 or SHA-256) on the long URL and take the first 6 or 7 characters as the short code.
Example:
Input URL: https://example.com/my-awesome-article
SHA256 Hash: de9f2c7fd25e1b3afad3e85a0bd17d9b7e1000d7
Short Code: de9f2c
But what if two URLs produce the same short code?
This is called a hash collision. To handle this, we can:
- Check the database for existing short code.
- If it already exists, append a counter or use a different hash algorithm.
Approach 2: Auto-Increment ID Encoding
Another way is to maintain a counter that auto-increments with each URL. For example, 1 becomes "a", 2 becomes "b", 62 becomes "10" (using base62 encoding).
Example:
ID: 125
Base62 Encoding: cb
Short URL: short.ly/cb
Question: Why use Base62 encoding?
Base62 uses characters [A-Za-z0-9], allowing us to represent large numbers compactly. This keeps URLs short and readable.
Database Schema Design
Let’s design a simple schema:
Field | Type | Description |
---|---|---|
id | BIGINT (Auto-increment) | Primary key for internal tracking |
short_code | VARCHAR(10) | Shortened URL token |
original_url | TEXT | Original long URL |
created_at | DATETIME | When the short URL was generated |
clicks | INT | Number of times the short URL was used |
Redirection Logic
When a user hits the short URL (e.g., short.ly/cb
):
- Extract the short code:
cb
- Query database:
SELECT original_url FROM urls WHERE short_code = 'cb'
- Redirect user to
original_url
Handling High Traffic
Question: What if millions of people access short URLs at once?
To handle large traffic, we can implement:
- Read replicas of the database
- CDNs to cache redirection logic
- In-memory caching (e.g., Redis)
Caching Strategy
Short URLs are accessed frequently. To speed things up, cache the most recent or most popular short codes using Redis.
Example:
Key: short:cb
Value: https://example.com/my-awesome-article
Rate Limiting
To prevent abuse (e.g., spamming long URLs), we can limit how many URLs a single user/IP can shorten in a given timeframe.
Analytics (Optional Feature)
- Total number of clicks
- Geo-location of users
- Browser/device types
Questions to Strengthen Understanding
Q1: What if someone tries to shorten the same URL again?
Answer: You can either return the same short URL (by checking if it already exists), or create a new one based on user context.
Q2: What if a user submits an invalid or malicious URL?
Answer: Validate the URL format and implement security checks like blacklist/whitelist domains.
Q3: What if two users want to shorten the same URL?
Answer: That’s okay. Either assign a shared short code or generate user-specific codes based on your business need.
Conclusion
Designing a URL shortener might look simple, but it introduces many system design concepts like hashing, database indexing, base conversion, caching, rate limiting, and horizontal scaling. By mastering this, you lay the groundwork for solving more complex system design problems.