HomeVolumesContestsSectionsForumsUsersPrintHelpAbout

Volumes > Array and Matrices > problem:


51042 - The most frequent k-mer

Guest
• Review clarifications (1)

Volume problems

• 50526 - Gold Market
• 50872 - Top M Grades
• 50868 - Sort Frequencies
• 50405 - Accounts Receivable
• 50377 - kth Permutation
• 50908 - Buy 1 Get 1
• 50906 - The Smallest Pair
• 51043 - Genome Sequencing
• 51042 - The most frequent k-mer
• 50657 - Permutations and Combinat...
• 51246 - Swap largest word, reverse ...
• 51069 - Last Digit of a Fibonacci Nu...
• 50659 - Covariance Matrix
• 50977 - Gaussian Elimination
• Word Puzzle
• 50748 - Gold Store
• 50658 - The Message

Feedback

If you notice incorrect translations in Contester, please let author know.

Time limit 2000/4000/4000/4000 ms. Memory limit 65000/65000/65000/65000 Kb.

The most frequent k-mer

Adapted from Genome sequencing by Phillip E. C. Compeau, and prepared by Ibrahim Mesecan
Question: You are working on a genome project. And you are given gene fragments. And, you need to find the most frequent k-mer (DNA sequence or a DNA word wkth the length k). When producing k-mers, you take k chars from the beginning of the string and you proceed one char at a time. For example, if you are given the DNA sequence CGTCGTCGT and k is 3; the first 3-mer is CGT, the second 3-mer is GTC. And TCG, CGT, GTC and TCG are the following four 3-mers. Write a program that reads a DNA sequence, and returns the number of occurences of the most frequent k-mer.

Input specification: You will be given two integers in the beginning,

  • the number of DNA fragments (n): an integer not greater than 2,000,
  • the length of k-mer (k): an integer between 1 and 20
Then, each of the following n lines will have gene fragments. where gene fragments contain only the letters (G, A, C or T and no space or any other char). Every line is not longer than 200 letters. Assume that the DNA fragments given in multiple line forms just one continuous DNA sequence.

Output specification: Show one integer (the number of appearance of the most frequent k-mer).

Sample Input I
2 3
CGTC
GTCGT
Sample Input II
3 3
CGCCTAGC
CAGCCGCCCGGAGCG
CCATGGCCTAGCC
Sample Output I
3
Sample Output II
7

Explanation for sample input 1: DNA sequence is given in two lines of DNA fragments: CGTCGTCGC. In which there are three k-mers:

  • GTC appears 2 times
  • TCG appears 2 times
  • CGT appears 3 times
So, CGT is the most frequent 3-mer.


Для отправки решений необходимо выполнить вход.

www.contester.ru