Rosalind - Speeding Up Motif Finding

rosalind

Problem:

Please find the problem here.

Solution:

The z array generated in the advanced version of KMP can be used to recover the standard failure array.

This analysis look a bit weird because this description is written way after the code is written. I have a significant debt of documenting my code.

Recall that the z array tell us the length of the z-box at the beginning of the z-box, but the failure array, wants us to report the length of a matching string at the end, therefore, we can use a simple walk to solve this problem.

Whenever we encounter a z-box while we are not already inside one, we set the state to be (1, z[i]), the idea is that this is the first character in the z-box, and we know where the z-box end.

Suppose we see another z-box start while inside a z-box, and suppose that z-box extend beyond the furthest known z-box end, then we can record at that point how many characters is already consumed, and the end of the new z-box there.

The rest is simple. If we see a case where we are inside a z-box, then just report the character consumed so far as the failure array value, and if we are also not at the end of the z-box, advance the state forward.

Code: