Alien Dictionary Using Topological Sort - Visualization

Problem Statement

The Alien Dictionary problem asks: Given a sorted dictionary of an alien language containing N words and K unique starting characters (from a known alphabet), determine the correct order of characters in the alien language.

The words are sorted lexicographically according to the rules of the alien language. By comparing adjacent words, we can derive the ordering of some characters. Our goal is to derive a valid total character order that satisfies all such pairwise orderings.

Note: Multiple valid orders may exist for the same input.

Examples

Words K Possible Order Description
["baa", "abcd", "abca", "cab", "cad"] 4 b → d → a → c Valid topological ordering derived from character precedence
["caa", "aaa", "aab"] 3 c → a → b ‘c’ comes before ‘a’, and ‘a’ before ‘b’
["abc", "ab"] 3 Invalid input Second word is a prefix of the first, contradicting lexicographical order
[] 0 "" Empty dictionary, no order possible
["x"] 1 x Only one character, trivially ordered

Solution

Understanding the Problem

We are given a list of words that are sorted lexicographically according to the rules of an unknown language. Our task is to determine the correct order of characters in this alien language based on the order of words. The key idea is that the character order must satisfy the relative positions of characters as implied by the word list.

This is a classic case of extracting a topological ordering of characters from the word relationships.

Step-by-Step Solution with Example

step 1: Understand what the input means

Suppose the input is a list of words like [ "baa", "abcd", "abca", "cab", "cad" ]. These words are sorted in an alien language. From this, we need to find the character order of that alien alphabet.

step 2: Build a graph of characters

Each unique character will be a node in a graph. By comparing adjacent words, we extract ordering rules. For example:

  • Compare "baa" and "abcd" → 'b' vs 'a' → since 'b' ≠ 'a', we know 'b' comes before 'a' → add edge b → a.
  • Compare "abcd" and "abca" → 'd' vs 'a' → 'd' comes before 'a' → add edge d → a.
  • Compare "abca" and "cab" → 'a' vs 'c' → 'a' comes before 'c' → add edge a → c.
  • Compare "cab" and "cad" → 'b' vs 'd' → 'b' comes before 'd' → add edge b → d.

step 3: Prepare in-degrees for topological sort

We calculate the in-degree (number of incoming edges) for each node (character). This helps identify nodes with no dependencies — these can appear first in the alien order.

step 4: Topological Sorting using Kahn’s Algorithm

Start with characters having in-degree 0. Add them to a queue and begin removing them one by one, appending to the result string and decreasing the in-degree of their neighbors. If a neighbor's in-degree becomes 0, add it to the queue.

step 5: Check for cycle

If at the end of this process, our result contains fewer characters than expected, then there is a cycle (conflicting character ordering), and we cannot determine a valid order.

step 6: Return the result

The result string formed by topological sort will represent the correct character order in the alien language if no cycle exists.

Edge Cases

  • Prefix issue: If a word is a prefix of a longer word and comes later (e.g., “apple” before “app”), this is invalid. We should detect this and return an empty result.
  • Disconnected characters: If a character appears in the word list but no ordering info is derived from it, we still need to include it in the result.
  • Duplicate words: Should be ignored as they give no additional info.
  • Single word input: We can return any order of the unique characters in that word.

Finally

This problem is a combination of graph building and topological sorting. By carefully comparing adjacent words and constructing a graph of character dependencies, we can extract a valid character order for the alien language. Kahn’s algorithm is particularly useful because it also helps detect cycles (invalid inputs). Always consider edge cases like prefix conflicts and disconnected components to build a robust solution.

Algorithm Steps

  1. Initialize an adjacency list and an in-degree map for all K characters.
  2. Compare each pair of adjacent words to extract precedence rules (edges).
  3. Build the graph and update the in-degrees of each character.
  4. Use a queue to perform BFS (Kahn’s algorithm):
    1. Push characters with in-degree 0 into the queue.
    2. Pop from queue, add to result string.
    3. Decrease in-degrees of neighbors, and push those with 0 in-degree.
  5. If the result string has length K, return it as the answer. Otherwise, input had a cycle or inconsistency.

Code

C
C++
Python
Java
JS
Go
Rust
Kotlin
Swift
TS
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX 26

void alienOrder(char* words[], int n, int k) {
    int adj[MAX][MAX] = {0};
    int indegree[MAX] = {0};

    for (int i = 0; i < n - 1; i++) {
        char* w1 = words[i];
        char* w2 = words[i + 1];
        int len = strlen(w1) < strlen(w2) ? strlen(w1) : strlen(w2);
        for (int j = 0; j < len; j++) {
            if (w1[j] != w2[j]) {
                int u = w1[j] - 'a';
                int v = w2[j] - 'a';
                if (!adj[u][v]) {
                    adj[u][v] = 1;
                    indegree[v]++;
                }
                break;
            }
        }
    }

    char queue[MAX];
    int front = 0, rear = 0;
    for (int i = 0; i < k; i++) {
        if (indegree[i] == 0) queue[rear++] = i;
    }

    while (front < rear) {
        int u = queue[front++];
        printf("%c", u + 'a');
        for (int v = 0; v < k; v++) {
            if (adj[u][v]) {
                if (--indegree[v] == 0) queue[rear++] = v;
            }
        }
    }
    printf("\n");
}

int main() {
    char* words[] = {"caa", "aaa", "aab"};
    int n = 3;
    int k = 3;
    printf("Order: ");
    alienOrder(words, n, k);
    return 0;
}

Time Complexity

CaseTime ComplexityExplanation
Best CaseO(N + K)We must iterate through all N characters to build the graph and process up to K nodes during topological sort. No skipping possible.
Average CaseO(N + K)Building edges between characters takes O(N) time and topological sorting over K nodes also takes O(K).
Worst CaseO(N + K)Even if all characters are interconnected, we must build the graph using all N words and sort up to K characters.

Space Complexity

O(K + E)

Explanation: We store an adjacency list for up to K characters and E edges (from character precedence rules), and also track in-degree for K nodes.


Comments

💬 Please keep your comment relevant and respectful. Avoid spamming, offensive language, or posting promotional/backlink content.
All comments are subject to moderation before being published.


Loading comments...