redis源代码分析4–字典（下）

hash表的两个有意思的处理过程为自动调整hash表的大小，在需要时实现增量rehash，其他如查找，删除等不解释。

我们先看看resize的处理逻辑。

dict.c中有个变量dict_can_resize控制着是否允许自动调整hash表的大小，该变量的改变主要靠如下两个函数：

void dictEnableResize(void) {
    dict_can_resize = 1;
}
void dictDisableResize(void) {
    dict_can_resize = 0;
}

总的说来，在系统运行有后台进程时，不允许自动自动调整大小，这是为了为了使得类linux系统的copy-on-write有更好的性能（没有调整大小，就没有rehash，这样父进程的db没有改变，子进程就不需要真的copy）。在后台子进程退出后，又会允许resize。这里说到的后台子进程主要跟redis的持久化机制有关，在后面的持久化之快照和持久化之aof中会详细介绍其过程。

接下来我们看看自动调整大小的算法。

主要是由dictExpand完成。该函数在将size转成最接近2的幂的数后（比如size=60，realsize会等于64，_dictNextPower复杂度为O(log(Size)) ），将相关参数保存在dict中的ht[1]中，并将其rehashidx置为0（置为-1，表示没有rehash进行），这样就可以开始rehash了。

int dictExpand(dict *d, unsigned long size)
{
    dictht n; /* the new hashtable */
    unsigned long realsize = _dictNextPower(size);
    --
    n.size = realsize;
    n.sizemask = realsize-1;
    n.table = _dictAlloc(realsize*sizeof(dictEntry*));
    n.used = 0;

    /* Initialize all the pointers to NULL */
    memset(n.table, 0, realsize*sizeof(dictEntry*));
    --
    /* Prepare a second hash table for incremental rehashing */
    d->ht[1] = n;
    d->rehashidx = 0;
    return DICT_OK;
}

而dictExpand主要被rdbLoadObject、_dictExpandIfNeeded、dictResize调用。
rdbLoadObject表示文件中加载一个object，当加载的是一个set对象时，一开始就将其扩展为set中元素的个数，可以大大减少随后rehash的次数（set用ht实现，而ht的初始值DICT_HT_INITIAL_SIZE = 4）

        if (type == REDIS_SET && listlen > DICT_HT_INITIAL_SIZE)
            dictExpand(o->ptr,listlen);

_dictExpandIfNeeded最终会被dictAdd调用，也就是说在每次向ht中增加一个元素时，都会判断。
_dictExpandIfNeeded的判断策略如下：

    /* If we reached the 1:1 ratio, and we are allowed to resize the hash
     * table (global setting) or we should avoid it but the ratio between
     * elements/buckets is over the "safe" threshold, we resize doubling
     * the number of buckets. */
    if (d->ht[0].used >= d->ht[0].size &&
        (dict_can_resize ||
         d->ht[0].used/d->ht[0].size > dict_force_resize_ratio))
    {
        return dictExpand(d, ((d->ht[0].size > d->ht[0].used) ?
                                    d->ht[0].size : d->ht[0].used)*2);
    }

dict_can_resize就是一开始提到的用于避免后台正在持久化时rehash的全局变量。除了这个变量，当ht桶的元素的平均个数多于 dict_force_resize_ratio=5时，也会强制进行rehash。扩展策略是现有值的2倍。

如果说_dictExpandIfNeeded会将ht增大的话，那么可以认为dictResize会将ht减小。其目标是将桶的个数减少到接近ht中所含元素的个数。

int dictResize(dict *d)
{
    int minimal;

    if (!dict_can_resize || dictIsRehashing(d)) return DICT_ERR;
    minimal = d->ht[0].used;
    if (minimal < DICT_HT_INITIAL_SIZE)
        minimal = DICT_HT_INITIAL_SIZE;
    return dictExpand(d, minimal);
}

跟ht的增大过程是在向dict中增加元素时对应，ht的减小过程主要是在删除元素时进行。而redis中的set和zset都是用ht实现的，因此dictResize主要被sremCommand和spopCommand（当dict用于set时）、zremCommand和 zremrangebyscoreCommand和zremrangebyrankCommand（当dict是zset时）、hashDelete调用。除此之外，serverCron（每隔100ms执行一次）会调用tryResizeHashTables，后者会进一步调用dictResize来判断整个redis哪些db需要调整（每个db又是一个大的ht）。

static int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
---
if (server.bgsavechildpid == -1 && server.bgrewritechildpid == -1) {
        if (!(loops % 10)) tryResizeHashTables();
        if (server.activerehashing) incrementallyRehash();
    }
---
}

static void tryResizeHashTables(void) {
    int j;

    for (j = 0; j < server.dbnum; j++) {
        if (htNeedsResize(server.db[j].dict))
            dictResize(server.db[j].dict);
        if (htNeedsResize(server.db[j].expires))
            dictResize(server.db[j].expires);
    }
}

而所有dictResize调用是否进行，要看htNeedsResize的返回值。htNeedsResize函数最终反映了redis中减小ht的策略。从其代码可以看出，当填充率不到10%时（ REDIS_HT_MINFILL定义为10），会将ht的桶的个数调整到接近ht中元素的个数（参看前面dictResize）。

static int htNeedsResize(dict *dict) {
    long long size, used;

    size = dictSlots(dict);
    used = dictSize(dict);
    return (size && used && size > DICT_HT_INITIAL_SIZE &&
            (used*100/size < REDIS_HT_MINFILL));
}

接下来看下rehash，主要在dictRehash中完成。先看下什么时候进行rehash。

在如上的serverCron中（每100ms执行一次），当没有后台子进程时，会调用incrementallyRehash，最终调用dictRehash。incrementallyRehash会对每个db，rehash大概1ms的时间，这1ms又进一步划分成多步，每步rehash 100个桶。

static void incrementallyRehash(void) {
    int j;

    for (j = 0; j < server.dbnum; j++) {
        if (dictIsRehashing(server.db[j].dict)) {
            dictRehashMilliseconds(server.db[j].dict,1);
            break; /* already used our millisecond for this loop... */
        }
    }
}

int dictRehashMilliseconds(dict *d, int ms) {
    long long start = timeInMilliseconds();
    int rehashes = 0;

    while(dictRehash(d,100)) {
        rehashes += 100;
        if (timeInMilliseconds()-start > ms) break;
    }
    return rehashes;
}

另外在_dictRehashStep，也会调用dictRehash，而_dictRehashStep每次会rehash一个桶从ht[0]到 ht[1](注：以前写的rehash一个值是错误的)，但由于_dictRehashStep是被dictGetRandomKey、dictFind、 dictGenericDelete、dictAdd调用的，因此在每次dict增删查改时都会被调用，这无疑就加快了rehash过程。

我们再来看看rehash过程。dictRehash每次增量rehash n个桶（dictRehash的参数n表示桶的个数），由于在自动调整大小时已设置好了ht[1]的大小，因此rehash的主要过程就是遍历ht[0]的各个桶，取得key，然后将该key按ht[1]的桶的大小重新rehash，并将相应的dictEntry从ht[0]移动到ht[1]。在rehash完后，会将ht[0]指向ht[1]，然后将ht[1]清空。

int dictRehash(dict *d, int n) {
    if (!dictIsRehashing(d)) return 0;

    while(n--) {
        dictEntry *de, *nextde;

        /* Check if we already rehashed the whole table... */
        if (d->ht[0].used == 0) {
            _dictFree(d->ht[0].table);
            d->ht[0] = d->ht[1];
            _dictReset(&d->ht[1]);
            d->rehashidx = -1;
            return 0;
        }

        /* Note that rehashidx can't overflow as we are sure there are more
         * elements because ht[0].used != 0 */
        while(d->ht[0].table[d->rehashidx] == NULL) d->rehashidx++;
        de = d->ht[0].table[d->rehashidx];
        /* Move all the keys in this bucket from the old to the new hash HT */
        while(de) {
            unsigned int h;

            nextde = de->next;
            /* Get the index in the new hash table */
            h = dictHashKey(d, de->key) & d->ht[1].sizemask;
            de->next = d->ht[1].table[h];
            d->ht[1].table[h] = de;
            d->ht[0].used--;
            d->ht[1].used++;
            de = nextde;
        }
        d->ht[0].table[d->rehashidx] = NULL;
        d->rehashidx++;
    }
    return 1;
}

前面详细了解了rehash的过程，现在可以看看redis为什么使用两个hash表了（ht[0]与ht[1]）。
通常，设计hash表时，只有一个ht。在进行rehash时，需要先申请更大内存（比如tmp指向该内存）后，然后一次性把ht中所有的数据rehash到tmp中，然后再让ht执行tmp。也就是说，rehash的粒度是整个ht。
采用两个表后，rehash时的粒度最小可降到一个桶（而每个桶中元素的平均个数为前面提到的dict_force_resize_ratio=5）。很显然，rehash被分散处理了。另外，由于rehash只是移动dictEntry从ht[0]到ht[1]，整个使用的内存并没有增加。
最后，简单介绍下在rehash进行时，元素插入、查找、删除的过程（没有rehash时，所有的增删查改都指向ht[0]）。
插入会直接插入到ht[1]中。查找、删除会查找两个表ht[0]、ht[1]。

redis中用到的整数hash、字符串hash算法如下，做个备份：

/* Thomas Wang's 32 bit Mix Function */
unsigned int dictIntHashFunction(unsigned int key)
{
    key += ~(key << 15);
    key ^=  (key >> 10);
    key +=  (key << 3);
    key ^=  (key >> 6);
    key += ~(key << 11);
    key ^=  (key >> 16);
    return key;
}
/* Generic hash function (a popular one from Bernstein).
* I tested a few and this was the best. */
unsigned int dictGenHashFunction(const unsigned char *buf, int len) {
    unsigned int hash = 5381;

    while (len--)
        hash = ((hash << 5) + hash) + (*buf++); /* hash * 33 + c */
    return hash;
}

redis源代码分析4–字典（下）

发表评论取消回复

最近文章

近期评论

分类目录

功能

redis源代码分析4–字典（下）

发表评论 取消回复

最近文章

近期评论

分类目录

友情链接

功能

发表评论取消回复