rosalind
Rosalind - Genome Assembly as Shortest Superstring
rosalindProblem: Please find the problem here. Solution: The condition that there exists a unique path in the overlap graph is super important. Once we have the overlap graph, the topological sort would be the right answer, because the path would guarantee there is a single answer. Unlike the earlier problem Overlap Graphs, we do not know how long is the overlap. Therefore we need to be smart. We could try the overlap value, but that would lead to a quadratic algorithm.
Rosalind - Completing a Tree
rosalindProblem: Please find the problem here. Solution: By iteratively eliminating leaves, it is easy to see that a tree of \( n \) nodes has \( n - 1 \) edges. Therefore the solution is simply computing \( (n - 1) \) - number of edges, which is also equal to \( n \) - number of lines. Remember the first line is the number of nodes :) Code:
Rosalind - Partial Permutations
rosalindProblem: Please find the problem here. Solution: This is simply the permutation coefficient. All we need to do is to compute it. Code:
Rosalind - Longest Increasing Subsequence
rosalindProblem: Please find the problem here. Solution: This solution is based on dynamic programming, in particular, patience sorting. The code should explain in details how its work. Code:
Rosalind - Enumerating k-mers Lexicographically
rosalindProblem: Please find the problem here. Solution: The key to do this problem is to interpret these k-mers as an integer with the number of alphabets as a base. That way enumerating them lexicographically is simply looping. Code:
Rosalind - RNA Splicing
rosalindProblem: Please find the problem here. Solution: I am lazy with this one. I could try my hands on the rope data structure, but I didn’t. I simply reconstructed the string again and again on the splicing. To implement this properly with the rope data structure is going to be challenging. In particular, I need to implement my own version of fast substring search on the rope data structure. Code:
Rosalind - Locating Restriction Sites
rosalindProblem: Please find the problem here. Solution: A restriction site is basically a palindrome (with the complement twist). I modified the Manacher’s algorithm to solve this problem. First, when trying to expand to find palindrome half-length, I apply the complement rule. Second, I skipped the ‘#’ sign thing because we knew all palindromes will be of even length. Code:
Rosalind - Calculating Protein Mass
rosalindProblem: Please find the problem here. Solution: This is simply mapping from protein to their mass and then sum it up Code:
Rosalind - Finding a Protein Motif
rosalindProblem: Please find the problem here. Solution: I attempted to create a solution that processes each character exactly once, and I succeed, here is the state machine that works. The diagram should be self-evident - the code is a just a faithful implementation of the diagram. Code:
Rosalind - Enumerating Gene Orders
rosalindProblem: Please find the problem here. Solution: A simple recursive solution is to pick one out of the list, and compute the permutation of the rest. The problem is, how do we represent ’the rest’? My approach is that we make sure the suffix of the list is ’the rest’, and therefore we can simply use an integer to represent the rest. Code: