Group Anagram

Question

Given an array of strings strs, group all anagrams together into sublists. You may return the output in any order.

An anagram is a string that contains the exact same characters as another string, but the order of the characters can be different.

Example 1:

Input: strs = ["act","pots","tops","cat","stop","hat"]

Output: [["hat"],["act", "cat"],["stop", "pots", "tops"]]

Example 2:

Input: strs = ["x"]

Output: [["x"]]

Example 3:

Input: strs = [""]

Output: [[""]]

Solution

While it's easy to do a dict[sorted(word)].append(word) but it will be $O(m * nlogn)$. Where m is the number of strs and n is the longest string.

However in this case we can do $O(m \times n)$

Intuition

The main thing here is that what's the most efficient way to hash beside sorting since sorting is unnecessary

Since we only have 26 letters, we can create an id system where it takes the frequency to build id

Say we have an array of frequency of 26 letters corresponding to the frequency from a -> z

[0,0,0,0,.....,0]
 a,b,c,d,......,z

The word abd would have the id [1,1,0,1,0.....0]

[1,1,0,1,0,0,....0]
 a,b,c,d,e,f,....z

This would be the same thing with the word adb since they are anagram.

As a result, we can use the list here as the id. However, in python you cannot use the list as the key, but you can use Tuple as the key. We just need to convert from tuple to list

Implementation

from collections import defaultdict, Counter

class Solution:
    def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
        result = defaultdict(list)

        for w in strs:
            result[self.hash(w)].append(w)
        
        return result.values()
    
    def hash(self, word):
        frequency = [0] * 26

        for c in word:
            frequency[ord(c) - ord('a')] += 1
        
        return tuple(frequency)

Time and space complexity

  • Time complexity: $O(m \times n)$
  • Space complexity: $(m)$

Where $m$ is the longest word and $n$ is the number of words.