Thursday, July 17, 2008

Searching a string in a set of strings

Given a string, search it in a set of strings (say among 1000s of string). What data structure would you use to store those 1000 strings and get the results fastest?

Solution 1 : Simple, but not best solution
Create a hashtable, and store all the strings in that hashtable. For the given string, search in that hashtable. In the worst case, if we get same hashcode for all the strings, then we have to compare with all the strings.

Solution 2 : Best solution
Create a Trie structure, and search for the string in that trie. Depending upon how we construct the trie, the complexity varies from O(n) to O(n*m) where n is the no.of characters in the given string, and m is the no.of different characters that we have in all the strings.

In the trie, for each node, if we create children for all the possible characters, then the complexity will be O(n), where n is the no.of characters in the given string. The memory requirement will be high in this case.

If we don't create children for all the possible characters, and create children only for the strings that exist in our set, then at each level, we have to search among the children to find the string. So, the complexity would be O(n*m) where n is the no.of characters in the given string, and m is the no.of different characters that we have in all the strings. In this case, we will not use extra memory.

No comments:

Post a Comment