rosalind
Rosalind - Transitions and Transversions
rosalindProblem: Please find the problem here. Solution: This is just a counting exercise Code:
Rosalind - Enumerating Oriented Gene Orderings
rosalindProblem: Please find the problem here. Solution: Like the previous problem, we use a recursive routine to generate all permutations. For each permutation, we will use another recursion to give each element a sign. Code:
Rosalind - Interleaving Two Motifs
rosalindProblem: Please find the problem here. Solution: Again, using the matrix generated by the Levenshtein’s edit distance algorithm, we can compute the shortest common super sequence. Code:
Rosalind - Edit Distance Alignment
rosalindProblem: Please find the problem here. Solution: Again, using the matrix generated by the Levenshtein’s edit distance algorithm, we can compute the alignment. Code:
Rosalind - Edit Distance
rosalindProblem: Please find the problem here. Solution: After implementing the Levenshtein’s edit distance algorithm, it is time to use it :) Code:
Rosalind - Finding a Shared Spliced Motif
rosalindProblem: Please find the problem here. Solution: This is the classic longest common subsequence problem, it can be solved using the Levenshtein edit distance algorithm. If we disallow replacing characters, then the edit can be visualized as an alignment of two strings as follow: A A C C T T G G A C A C T G T G A When the first string has a gap, this is an insertion operation.
Rosalind - Introduction to Set Operations
rosalindProblem: Please find the problem here. Solution: This is rather straightforward. By making the input sets hash tables, we can do these set operations fast. Code:
Rosalind - Finding a Spliced Motif
rosalindProblem: Please find the problem here. Solution: To find the corresponding indexes, we initialize two pointers to the beginning of both strings. Walk the sequence pointer one by one, and walk the subsequence pointer only when there is a match. This way we will find all the indexes. Code:
Rosalind - Counting Subsets
rosalindProblem: Please find the problem here. Solution: The number of subsets of a set of size \( n \) is \( 2^n \). The following code implements the repeated squaring algorithm. The numbers are computed modulo \( 1000000 \) as soon as we are done with multiplying. This is done to keep the number small, and so that the multiplications are quick. Code:
Rosalind - Introduction to Random Strings
rosalindProblem: Please find the problem here. Solution: As we were told in the description, the probability of getting ‘A’ or ‘T’ is \( \frac{1-x}{2} \) and the probability of getting ‘G’ or ‘C’ is \( \frac{x}{2} \). Assuming independence (sadly the problem does not mention that), the probability of the whole string is the product of these probabilities. To make it numerically stable, instead of computing the product and then compute the logarithm, it is better to compute the logarithm and sum them up.