KMP

发表于 2023-07-26

KMP

KMP是用来处理字符串匹配问题的一种高效方法

例题：找出字符串中第一个匹配项的下标

给你两个字符串 haystack 和 needle ，请你在 haystack 字符串中找出 needle 字符串的第一个匹配项的下标（下标从 0 开始）。如果 needle 不是 haystack 的一部分，则返回 -1 。

示例 1：

输入：haystack = "sadbutsad", needle = "sad"
输出：0
解释："sad" 在下标 0 和 6 处匹配。
第一个匹配项的下标是 0 ，所以返回 0 。

示例 2：

1
2
3

输入：haystack = "leetcode", needle = "leeto"
输出：-1
解释："leeto" 没有在 "leetcode" 中出现，所以返回 -1 。

如果按照原始的暴力方法，算法复杂度将达到O（n*m），而KMP算法可以将复杂度降低到O（n+m）。

KMP算法主要由两步构成

第一步：构造needle字符串的next数组

next数组主要用来记录当前字符串的最长前后缀有几位，比如字符串a b c d a b d 的next数组就是0 0 0 0 1 2 0

当字符串为abcd时，前后缀一点也不一样；当字符串为abcda时，字符串的开头和结尾都是a，所以next数组就是1；同理，abcdab则是对应2…

代码：

private int[] getNext(String s){
	int[] next = new int[s.length()];
	int j = 0;
	next[0]=0;
	//A B C D A B D
	//i初始值为1，否则i j相等
	for (int i = 1;i<s.length();i++){
		//j等于0时，说明之前还没出现过相等的前缀和后缀
		while (j>0&&s.charAt(i)!=s.charAt(j)){
			j = next[j-1];
		}
		//当两个位置的字符相同时，说明是相同的前后缀
		if (s.charAt(i)==s.charAt(j)){
			j++;
		}
		next[i]=j;
	}
	return next;
}

第二步：与haystack进行比较

从头开始比较haystack和needle字符串，相等时j++；不相等时，按照needle的next数组，将needle字符串后移next[j-1]位

例如：haystack为abxabcabcaby，needle为abcaby，当haystack的第9位和needle的第6位进行比较时，发现不相等，这是j将变为next[5]=2，因为当needle可以匹配到第二位时，说明必有ab，所以下次匹配时可以从needle的第三位开始匹配。

public int strStr(String haystack,String needle){
	if (needle.length()==0){
		return 0;
	}
       //得next数组
	int[] next = getNext(needle);

	int j = 0;
	for (int i = 0;i<haystack.length();i++){
		while (j>0&&haystack.charAt(i)!=needle.charAt(j)){
			j = next[j-1];
		}
		if (haystack.charAt(i)==needle.charAt(j)){
			j++;
		}
		if (j==needle.length()){
			return i-j+1;
		}
	}
	return -1;
}