More efficient pattern matching
It was mentioned in a previous lesson that a hashing function could be used in string pattern matching due to its “sliding window” property. This is the idea of the Rabin-Karp algorithm. In the resources below we’ve included links to Wikipedia and TopCoder articles about it. It is relatively easy to implement and offers an improvement over the previous algorithm but it’s worst-case running time is still O(Lt * Lp)
.
That is why it’s worth considering some more advanced algorithms. A very nice algorithm worth learning and implementing is the Knuth-Morris-Pratt algorithm. The algorithm first constructs a table with some data based on the pattern with time complexity O(Lp)
. Then for the search of the pattern within the text it uses the data from the table and this takes O(Lt)
. These are worst-case running times. We encourage you to learn this algorithm and you will also have the chance to apply it in one of the practice problems in this section. The links about it from Wikipedia and Topcoder, found below, can be useful.
Some other techniques worth mentioning are the Aho–Corasick algorithm, suffix trees and suffix arrays.
Resources
- TopCoder on Rabin-Karp and Knuth-Morris-Pratt Algorithms
- Wikipedia on Rabin-Karp
- Wikipedia on Knuth-Morris-Pratt
- Wikipedia on Aho–Corasick algorithm
- Wikipedia on suffix arrays
- Wikipedia on suffix trees