Thursday, March 2, 2017

[LintCode] 594 strStr II 解题报告

Description
Implement strStr function in O(n + m) time.

strStr return the first index of the target string in a source string. The length of the target string is m and the length of the source string is n.
If target does not exist in source, just return -1.


Example
Given source = abcdef, target = bcd, return 1.


思路
O(n + m)的算法有2个: KMP和Rabin-Karp。
还有一个best performance是O(N/M)但是worst case是O(MN)的算法Boyer-Moore。

先写一个Boyer-Moore的,提交也能通过。
每次比的时候是从target的最后一位往前比。比到有mismatch,就把i的位置从source里往后移动,移动多少根据用target来pre-compute的一个表。
具体算法参照Princeton老头的算法课件。老头讲得特别清楚。


Boyer-Moore
Code
public class Solution {
    /**
     * @param source a source string
     * @param target a target string
     * @return an integer as index
     */
    public int strStr2(String source, String target) {
        // Write your code here
        
        if (source == null || target == null) {
            return -1;
        }
        
        int N = source.length();
        int M = target.length();
        
        if (M == 0) {
            return 0;
        }
        
        int R = 256;
        int[] right = new int[R];
        
        for (int c = 0; c < R; c++) {
            right[c] = -1;
        }
        for (int j = 0; j < M; j++) {
            right[target.charAt(j)] = j;
        }
        
        int skip;
        for (int i = 0; i < N - M + 1; i += skip) {
            skip = 0;
            for (int j = M - 1; j >= 0; j--) {
                if (source.charAt(i + j) != target.charAt(j)) {
                    skip = Math.max(1, j - right[source.charAt(i + j)]);
                    break;
                }
            }
            if (skip == 0) {
                return i;
            }
        }
        
        return -1;
    }
}

Rabin-Karp
Code
public class Solution {
    /**
     * @param source a source string
     * @param target a target string
     * @return an integer as index
     */
     
    public int strStr2(String source, String target) {
        // Write your code here
        if (source == null || target == null) {
            return -1;
        }
        
        int N = source.length();
        int M = target.length();
        
        if (M == 0) {
            return 0;
        }
        
        int R = 31;
        int Q = 997;
        int RM = 1;
        
        for (int i = 0; i < M; i++) {
            RM = RM * R % Q;
        }
        
        int targetHash = 0;
        for (int i = 0; i < M; i++) {
            targetHash = (targetHash * R + target.charAt(i)) % Q;
        }
        
        int sourceHash = 0;
        for (int i = 0; i < N; i++) {
            sourceHash = (sourceHash * R + source.charAt(i)) % Q;
            if (i < M - 1) {
                continue;
            }
            
            if (i >= M) {
                sourceHash = sourceHash - (source.charAt(i - M) * RM) % Q;
                if (sourceHash < 0) {
                    sourceHash += Q;
                }
            }
            
            if (sourceHash == targetHash) {
                if (source.substring(i - M + 1, i + 1).equals(target)) {
                    return i - M + 1;
                }
            }
        }
        
        return -1;
    }
}




KMP有个问题,因为DFA的方法预处理的表的空间为R * M,所以memory会越界。