首页

    博客开始支持标题搜索

    标签:algorithm,go,blog

    功能

    • 支持前缀搜索
    • 前缀搜索支持拼音输入
    • 支持分词搜索

    redis sorted set方式

    将标题分解为前缀形式,保存在sorted set,比如nghttpx tipsnginx pass real ip

    对nghttpx tips

    ZADD 'n' 1 'nghttpx tips'
    ZADD 'ng' 1 'nghttpx tips'
    ZADD 'ngh' 1 'nghttpx tips'
    ZADD 'nght' 1 'nghttpx tips'
    ...
    

    对nginx pass real ip

    ZADD 'n' 1 'nginx pass real ip'
    ZADD 'ng' 1 'nginx pass real ip'
    ZADD 'ngi' 1 'nginx pass real ip'
    ZADD 'ngin' 1 'nginx pass real ip'
    ...
    

    搜索的时候,比如ng

    ZRANGE 'ng' 0 -1
    

    因为key是ng的sorted set是这样保存的

    key value
    ng  ['nghttpx tips','nginx pass real ip']
    

    汉字->拼音

    这里使用go-pinyin,注意转换的拼音要替换原来的汉字

    import (
        "strings"
    
        pinyin "github.com/mozillazg/go-pinyin"
    )
    
    func Convert(str string) []string {
        result := []string{}
        p := pinyin.NewArgs()
        for i := 0; i < len(str); i++ {
            if str[i] == ' ' {
                continue
            }
    
            var key string
            if len(string(str[i])) > 1 && len(string(str[i:])) > 1 {
                pinyin_str := pinyin.Pinyin(string(str[i:i+3]), p)
                key = str[:i] + pinyin_str[0][0]
                str = key + str[i+3:]
            } else {
                key = strings.ToLower(str[:i+1])
            }
            key = strings.Replace(str[:i+1], " ", "", -1)
            result = append(result, key)
        }
        return result
    }
    

    redis客户端用的是radix

    ...
    for title, path := range data {
        prefixList := Convert(title)
        for _, key := range prefixList {
            redisClient.PipeAppend("ZADD", key, 1, title)
        }
        title_pinyin := prefixList[len(prefixList)-1]
        redisClient.PipeAppend("HSET", "title_path", title_pinyin, path)
    }
    redisClient.PipeResp()
    

    这里用了pipeline批量添加到sorted set.在添加的同时,将title和path的映射保存到hash中

    trie方式

    其实前缀搜索很适合使用trie保存各词条,可以看看Tushar Roy老师讲解
    但是这样有一个问题,如果词条很多,不可能将整个trie加载到内存,这就需要将trie持久化,这里使用redis

    根据trie的定义,节点是这样

    type Node struct {
        end bool
        m   map[byte]*Node
    }
    

    m属性的key是字符,value是下一个Node.现在将ABCABGL插入trie

    这时可以将Node看作redis的hash类型

    hash类型的key用uuid标识,field-value相当于上面Node的m属性,field还是字符,value变成下一个hash类型的key(uuid).为了方便,将end标识符也加到field-value

    func Generate(root string, bytes []byte) string {
        var cur string
        size := len(bytes)
        if len(root) == 0 {
            uid := uuid.Must(uuid.NewV4()).String()
            RedisClient.Cmd("HSET", uid, "end", 0)
            root, cur = uid, uid
        } else {
            cur = root
        }
    
        for i := 0; i < size; i++ {
            next, _ := RedisClient.Cmd("HGET", cur, bytes[i]).Str()
            if next != "" {
                cur = next
            } else {
                uid := uuid.Must(uuid.NewV4()).String()
                RedisClient.Cmd("HSET", cur, bytes[i], uid)
                cur = uid
            }
        }
        RedisClient.Cmd("HSET", cur, "end", 1)
        return root
    }
    

    前缀搜索,执行下面代码的FindBySuffix方法。具体的,如搜索AB

    • 找到B的下一个节点(key=uuid3)
    • 开始遍历,遍历的结果(C,GL)和AB组合就是前缀搜索的结果(ABC,ABGL)
    func Tranversal(root string) []string {
        result := []string{}
        var dfs func(head string, str string)
        dfs = func(head string, str string) {
            resp, i := RedisClient.Cmd("HGETALL", head), 0
            elems, _ := resp.Array()
            for i < len(elems) {
                key, _ := elems[i].Str()
                val, _ := elems[i+1].Str()
                if key == "end" {
                    if val == "1" {
                        result = append(result, str)
                    }
                } else {
                    ch, _ := elems[i].Int()
                    dfs(val, str+string(ch))
                }
                i += 2
            }
        }
        dfs(root, "")
        return result
    }
    
    func FindBySuffix(suffix string, root string) []string {
        result := []string{}
        size, cur := len(suffix), root
    
        for i := 0; i < size; i++ {
            next, _ := RedisClient.Cmd("HGET", cur, suffix[i]).Str()
            if next != "" {
                cur = next
            } else {
                return []string{}
            }
        }
        for _, v := range Tranversal(cur) {
            result = append(result, suffix+v)
        }
        return result
    }
    

    分词搜索

    这个和上面前缀搜索-redis sorted set方式类似,分词使用gojieba.具体的,如博客迁移到kubernetes,标题分词是[博客 迁移 到 kubernetes],将每个分词保存到sorted set

    ZADD '博客' 1 '博客迁移到kubernetes'
    ZADD '迁移' 1 '博客迁移到kubernetes'
    ...
    

    搜索和上面一样,如搜索迁移

    ZRANGE '迁移' 0 -1
    

    不定期更新