Rosalind - Genome Assembly as Shortest Superstring
rosalindProblem:
Please find the problem here.
Solution:
The condition that there exists a unique path in the overlap graph is super important. Once we have the overlap graph, the topological sort would be the right answer, because the path would guarantee there is a single answer.
Unlike the earlier problem Overlap Graphs, we do not know how long is the overlap. Therefore we need to be smart. We could try the overlap value, but that would lead to a quadratic algorithm. My key idea by modifying the Knuth–Morris–Pratt algorithm, we can return the alignment when the string search just ends. That way we know what is the maximum overlap value.
Of course, that means I need to implement the algorithm, and I did.
Code: