rosalind

Rosalind - Ordering Strings of Varying Length Lexicographically

rosalind

Problem: Please find the problem here. Solution: I wanted to implement a solution that is analogous to an odometer. The key thing that we wanted to understand is how we can advance from the current word to the next one. Suppose we have the next function, then we can just repeatedly use that function to produce the next one. The next function would probably not work on the last element, so that will naturally provide us with a stopping condition as well.

Rosalind - k-Mer Composition

rosalind

Problem: Please find the problem here. Solution: To produce the array, slide a window of width 4 from the leftmost to the rightmost of the string. When processing a window, we quickly compute the k-mer index as a base 4 number. Code:

Rosalind - Maximum Matchings and RNA Secondary Structures

rosalind

Problem: Please find the problem here. Solution: Suppose we have 3 A and 2 U, the first U can choose from 3 A, the second choose from 2 A, therefore, the number of matches for these 3 A and 2 U is 3 x 2 = 6. In general, the answer is the falling factorial. Code:

Rosalind - Counting Phylogenetic Ancestors

rosalind

Problem: Please find the problem here. http://rosalind.info/problems/inod/ Solution: We knew that a rooted binary tree with n leaves has \( n - 1 \) internal nodes. An unrooted binary tree is simply a rooted binary tree with an extra leaf attached to the root. Therefore, the number of leaves increased by \( 1 \) but the number of internal nodes left unchanged. So an unrooted binary tree with \( n + 1 \) leaves has \( n - 1 \) internal nodes.

Rosalind - Sorting by Reversals

rosalind

Problem: Please find the problem here. Solution: As we have already pre-computed the shortest path in the previous problem, all we have to present the shortest path. Code:

Rosalind - Reversal Distance

rosalind

Problem: Please find the problem here. Solution: This is a tough problem, I handled it using a few techniques. The problem hinted on brute force, in order to make sure it runs within the allocated time, it is easy to just pre-compute the solutions. The set of all solutions is daunting (10!) squared. I make one observation to make it simple - the symbol themselves does not matter, we could rename all the symbols consistently and we will get exactly the same reversal distance.

Rosalind - Creating a Distance Matrix

rosalind

Problem: Please find the problem here. Solution: This is just calculating the pairwise hamming distance as required. Code:

Rosalind - Speeding Up Motif Finding

rosalind

Problem: Please find the problem here. Solution: The z array generated in the advanced version of KMP can be used to recover the standard failure array. This analysis look a bit weird because this description is written way after the code is written. I have a significant debt of documenting my code. Recall that the z array tell us the length of the z-box at the beginning of the z-box, but the failure array, wants us to report the length of a matching string at the end, therefore, we can use a simple walk to solve this problem.

Rosalind - Perfect Matchings and RNA Secondary Structures

rosalind

Problem: Please find the problem here. Solution: Denote the number of ‘A’ to be x. Fixing a U, there are x choices. Once an A is used, it cannot be used anymore. Therefore, there are x! way to bond the A and U pairs. Similarly, we have y! way to bond the C and G pairs (where y is the number of ‘C’). Therefore the total number of ways to bond these pairs is x!

Rosalind - Open Reading Frames

rosalind

Problem: Please find the problem here. Solution: There are two parts to this problem. We need to find all the start and stop codons, and we need to pair them up so that we know where to start and stop decoding. The former is performed using the Aho Corasick algorithm, by using a trie, we can match all codons in a single pass. Imagine we are walking from left to right, whenever we see a stop codon, we want all the start codon before it that is after the last stop codon.