滑动窗口算法延伸:Rabin Karp 字符匹配算法 :: labuladong的算法小抄 #981
Replies: 27 comments 11 replies
-
纯纯滑动窗口思路Java代码如下: public static List<String> findRepeatedDnaSequences(String s) {
int L = 10;
Set<String> res = new HashSet();
List<String> Fres = new ArrayList<>();
HashSet<String> seen = new HashSet<>();
List<Character> window = new ArrayList<>();
int left = 0;int right = 0;
while (right < s.length()) {
window.add(s.charAt(right));
right++;
if (right - left == L) {
String windowStr = "";
for (int i = 0 ; i < window.size();i++) {
windowStr += window.get(i);
}
if (seen.contains(windowStr)) {
// System.out.println("找到一个重复子串: "+ windowStr);
res.add(windowStr);
}else {
seen.add(windowStr);
}
window.remove(0);
left++;
}
}
Fres.addAll(res);
return Fres;
} |
Beta Was this translation helpful? Give feedback.
-
func findRepeatedDnaSequences(s string) []string {
arr := make([]byte, len(s))
for i, v := range s {
switch string(v) {
case "A":
arr[i] = 0
case "C":
arr[i] = 1
case "G":
arr[i] = 2
case "T":
arr[i] = 3
}
}
left, right := 0, 0
var res []string
resMap := map[string]struct{}{}
seek := map[int]bool{}
tmp := 0
for right < len(arr) {
tmp = addNewStr(tmp, arr[right])
right++
for right-left >= 10 {
if _, exist := seek[tmp]; exist {
resMap[s[left:right]] = struct{}{}
} else {
seek[tmp] = true
}
left++
}
}
for key := range resMap {
res = append(res, key)
}
return res
}
func addNewStr(num int, b byte) int {
num = num << 2
num = num & 0xfffff
num = num | int(b)
return num
} go语言解决重复的DNA序列问题,感谢大佬提供的思路 |
Beta Was this translation helpful? Give feedback.
-
取了个巧,刚好这个问题只需要求重复的10个序列,而总共只有4种碱基,利用2个bit位刚好能够表示,总共只需要20个bit位,也就是int32其实就够了 |
Beta Was this translation helpful? Give feedback.
-
我看官方题解下面也有人在讨论DNA这个题,如本书作者所说(下图1),O(L)的时间复杂度来自于生成子串,可是加入了hash后还是存在生成子串的过程吧(下图2)?所以时间复杂度不是应该还是要考虑L? |
Beta Was this translation helpful? Give feedback.
-
TypeScript function findRepeatedDnaSequences(s: string): string[] {
const ret: Set<string> = new Set();
const n = s.length;
const map = {
'A': 0,
'G': 1,
'C': 2,
'T': 3,
};
const nums: number[] = new Array(n);
// 将字符串转数字
for(let i = 0; i < n; i++) {
nums[i] = map[s[i]];
};
// 数字长度
const L = 10;
// 4 进制
const R = 4;
const RL = Math.pow(R, L - 1);
// 左右窗口边界索引
let left = 0;
let right = 0;
// 记录重复出现的 hash 值
const seenHash = {};
// 窗口内 hash
let windowHash = 0;
while (right < s.length) {
windowHash = windowHash * R + nums[right];
right++;
while (right - left === L) {
if (seenHash[windowHash]) {
ret.add(s.substring(left, right));
} else {
seenHash[windowHash] = true;
}
windowHash = windowHash - nums[left] * RL;
left++;
}
}
return Array.from(ret);
}; |
Beta Was this translation helpful? Give feedback.
-
第一章早就刷完了怎么还没小旗子,原来更新了 |
Beta Was this translation helpful? Give feedback.
-
推导一下公式
|
Beta Was this translation helpful? Give feedback.
-
-1%3 =2 最后rabin-karp算法中的左移处理时,不加Q也是可以的 |
Beta Was this translation helpful? Give feedback.
-
打卡 |
Beta Was this translation helpful? Give feedback.
-
windowHash = ((R * windowHash) % Q + txt.charAt(right)) % Q |
Beta Was this translation helpful? Give feedback.
-
class Solution {
public List<String> findRepeatedDnaSequences(String s) {
int [] nums = new int [s.length()];
for(int i = 0 ; i<s.length();i++){
switch(s.charAt(i)){
case 'A':{nums[i]=0; break;}
case 'C':{nums[i]=1; break;}
case 'G':{nums[i]=2; break;}
case 'T':{nums[i]=3; break;}
}
}
int L = 10 , R = 4, RL = (int)Math.pow(R,L-1);
HashSet<Integer> seen = new HashSet<>();
int window = 0;
HashSet<String> res = new HashSet<>();
int left = 0 , right = 0;
while(right < nums.length){
window = R * window + nums[right];
right++;
while(right-left == L){
if(seen.contains(window)){
res.add(s.substring(left,right));
}else{
seen.add(window);
}
window = window - nums[left]*RL;
left++;
}
}
return new ArrayList<String>(res);
}
}
|
Beta Was this translation helpful? Give feedback.
-
字符串太长 超过long上限了。。。。 |
Beta Was this translation helpful? Give feedback.
-
text bbabbaaaababaaaababaaabbaabaabaabaababbaaabbbabbaaaabbbbaabbbababbbabababaaabaaa
结果是超过long上限了 |
Beta Was this translation helpful? Give feedback.
-
一刷 看到这部分打算暂时跳过了😭 |
Beta Was this translation helpful? Give feedback.
-
总结一下知识,并且提出一个问题:There are two important properties of modulo:
扩大窗口hash_value = hash_value * R + right_digit 缩小窗口hash_value = hash_value - left_digit * R**(L-1)
扩大窗口hash_value % Q = (hash_value * R + right_digit) % Q 缩小窗口hash_value % Q = (hash_value - left_digit * R**(L-1)) % Q 计算这部分 这部分 R ** (L - 1) = R * R * R ... * R = R ** (L - 1) % Q = (R * R * R ... * R) % Q = (R % Q * R^(L - 2) % Q) % Q = (R % Q * (R % Q * R^(L - 3) % Q) % Q ) % Q If L = 3 用计算文中RL的方法:
( ( R % Q) * R) % Q |
Beta Was this translation helpful? Give feedback.
-
20230107 打卡,并提出一个灵魂问题:如果自己面试裸写,是不是还要写一个求素数的逻辑,否则怎么记得住2333 |
Beta Was this translation helpful? Give feedback.
-
这个解法里的window变量是不是可以不用,毕竟子串可以通过substring(left, right)得到 // 滑动窗口代码框架
} |
Beta Was this translation helpful? Give feedback.
-
python 版本 class Solution:
def findRepeatedDnaSequences(self, s: str) -> List[str]:
DNA_dict = {"A":0,"C":1,"G":2,"T":3}
result = set()
window = set()
# window_num应该是一个十位长的四进制数,每个数位上都代表着一种核酸类型
# 计算四进制数的十进制结果,通过保留四进制数,而不是十进制数和字符串,是为了节省内存空间
R = 4
L = 10
RL = pow(4,L - 1) # 最高位
window_num = 0
n = len(s)
if n < 10:
return []
else:
# 设定初始的windows
left = 0
right = 9
for i in range(10):
DNA = s[i]
window_num = window_num * R + DNA_dict[DNA]
while right < n:
# 每一次都是sub切片,所以会多次遍历s
if window_num not in window:
window.add(window_num)
else:
result.add(s[left:right+1])
# 减去最高位
window_num = window_num - DNA_dict[s[left]]*RL
left += 1
right += 1
if right != n:
# 向高位移动,再加上最低位
window_num = window_num * R + DNA_dict[s[right]]
return list(result) |
Beta Was this translation helpful? Give feedback.
-
这段是否有点问题,最高位如果为0,计算的结果等于数字本身,而例子中的计算DNA那题实际是存在高位为0的情况的 |
Beta Was this translation helpful? Give feedback.
-
//Rabin 考不考不得知 下面这种解法适合我等菜菜 class Solution {
}; |
Beta Was this translation helpful? Give feedback.
-
最后一份代码里面的for(int i = 1; i <= L - 1; i++),要把等号去掉。 |
Beta Was this translation helpful? Give feedback.
-
优雅的JS代码 var findRepeatedDnaSequences = function(s) {
if(s.length <= 10) return [];
const set = new Set(), ans = [];;
set.add(s.substring(0,10)); //存第一个子串
let l = 1 , r = 10; //从第二个子串开始
while(r<s.length){
const str = s.substring(l++,++r);
set.has(str) ? ans.push(str) : set.add(str);
}
return Array.from(new Set(ans));
}; |
Beta Was this translation helpful? Give feedback.
-
这真的是四分钟能读完的吗 |
Beta Was this translation helpful? Give feedback.
-
滑动窗口算法延伸:Rabin Karp 字符匹配算法 :: labuladong的算法小抄
https://labuladong.github.io/algo/2/19/27/
Beta Was this translation helpful? Give feedback.
All reactions