0
点赞
收藏
分享

微信扫一扫

LeetCode: 187. Repeated DNA Sequences

钟罗敏 2022-12-05 阅读 31


LeetCode: 187. Repeated DNA Sequences

解题思路

All DNA is composed of a series of nucleotides abbreviated as ​​A​​​, ​​C​​​, ​​G​​​, and ​​T​​​, for example: ​​"ACGAATTCCG"​​. When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

Example:

Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
Output: ["AAAAACCCCC", "CCCCCAAAAA"]

解题思路

四个核苷酸正好用二进制的两位表示,十个核苷酸序列用 20 位表示,即一个无符号整形(32位)。然后判断这些数字是否重复出现0。

AC 代码

class Solution {
public:
vector<string> findRepeatedDnaSequences(string s) {
// 对核苷酸标号
map<char, unsigned int> nucleotide2Idx = {{'A', 0b00}, {'C', 0b01}, {'T', 0b10}, {'G', 0b11}};
// 有效位的掩码:十个核苷酸,有效位是二十位
unsigned int rightNineMask = 0x000fffff;

size_t beg = 0, end = 0;
unsigned int curSeq=0;
set<unsigned int> occuredSeq;
set<string> ans;

for(size_t i = 0; i < s.size(); ++i)
{
curSeq = ((curSeq << 2) | nucleotide2Idx[s[i]]);
curSeq = curSeq & rightNineMask;
++end;

if(end - beg < 10)
{
continue;
}
if(end - beg > 10) ++beg;

if(occuredSeq.find(curSeq) == occuredSeq.end())
{
occuredSeq.insert(curSeq);
}
else
{
ans.insert(s.substr(beg, 10));
}
}

return vector<string>(ans.begin(), ans.end());
}
};


举报

相关推荐

0 条评论