Redis 字典设计-CFANZ编程社区

前言

字典即 hash 表是很多语言都有的数据结构，例如 Java 中的 HashMap。字典在日常开发中具有广泛的应用，并且是 Redis 服务的基石，在 Redis 数据库的实现、发布订阅、键值过期等场景有很重要的应用。本文从源码层面介绍 Redis 的结构与操作，深入理解其内部原理。

字典结构

struct dict {
    
    //保存当前字典的 hash 函数、key 比较函数等信息
    dictType *type;
    
    dictEntry **ht_table[2];
    unsigned long ht_used[2];

    long rehashidx; 

    //其他字段
    int16_t pauserehash; 
    signed char ht_size_exp[2]; 
    void *metadata[]; 
};

type 保存当前字典的 hash 函数、key 比较函数等信息。
ht_table 存储数据，是一个二维数组，用于渐进式哈希。数组的每个元素 dictEntry 是一个二维指针，可以看作是一个一维数组，数组每个元素是一个 dictEntry 类型的指针。
rehashidx 用于指示字典是否处于渐进式哈希状态。

dictType

typedef struct dictType {
    uint64_t (*hashFunction)(const void *key);
    int (*keyCompare)(dict *d, const void *key1, const void *key2);
} dictType;

对于 dictType 关心两个函数

hashFunction 计算redis对象的哈希值。
keyCompare 是字典 key 的比较函数。

字典操作

初始化

数据库字的典初始化是在 Redis 服务启动的时候。字典初始化很简单，分配内存，给相关属性赋值。

server.db[j].dict = dictCreate(&dbDictType);

dict *dictCreate(dictType *type)
{
    size_t metasize = type->dictMetadataBytes ? type->dictMetadataBytes() : 0;
    dict *d = zmalloc(sizeof(*d) + metasize);
    if (metasize) {
        memset(dictMetadata(d), 0, metasize);
    }
    _dictInit(d,type);
    return d;
}
int _dictInit(dict *d, dictType *type)
{
    _dictReset(d, 0);
    _dictReset(d, 1);
    d->type = type;
    d->rehashidx = -1;
    d->pauserehash = 0;
    return DICT_OK;
}

添加逻辑

Redis 字典的添加逻辑主要分成四个部分，查找位置、扩容重哈希，添加数据。主要逻辑在查找位置和扩容重哈希。

插入元素

首先计算插入 key 的哈希值，然后遍历哈希表，查找插入的位置，详细逻辑见注释。查找的逻辑也是类似（除了需要判断是否需要扩容），见 dictEntry *dictFind(dict *d, const void *key)。

void *dictFindPositionForInsert(dict *d, const void *key, dictEntry **existing) {
    unsigned long idx, table;
    dictEntry *he;
    
    //计算key的哈希值，数据库字典使用的是，下面这个函数 
    // uint64_t dictSdsHash(const void *key)  
    uint64_t hash = dictHashKey(d, key);
    if (existing) *existing = NULL;
    
    //如果正在重哈希，那么做一次重哈希，后面介绍
    if (dictIsRehashing(d)) _dictRehashStep(d);

    //是否需要扩容
    if (_dictExpandIfNeeded(d) == DICT_ERR)
        return NULL;
        
    //在两个哈希表中查找数据，这是因为如果正在重哈希的话，数据在两个哈希表中都存在
    for (table = 0; table <= 1; table++) {
        idx = hash & DICTHT_SIZE_MASK(d->ht_size_exp[table]);
     
        //根据哈希值得到的索引，获取对应的值
        he = d->ht_table[table][idx];
        
        //redis 采用拉链法解决哈希冲突，这里是遍历链表
        while(he) {
            void *he_key = dictGetKey(he);
            if (key == he_key || dictCompareKeys(d, key, he_key)) {
                //如果 key 相等的情况下，查找存在的节点，则保留
                if (existing) *existing = he;
                return NULL;
            }
            //获取链表下一个节点
            he = dictGetNext(he);
        }
        if (!dictIsRehashing(d)) break;
    }
    //在字典中没有找到
    dictEntry **bucket = &d->ht_table[dictIsRehashing(d) ? 1 : 0][idx];
    return bucket;
}

函数调用结果有两种情况

第一种是根据 key 查找到已经存在的元素。函数返回值是 NULL，传入参数设置成 existing。
第二种是没有查找到元素，函数返回 idx 所在的元素，最后依据局部性原理使用头插法添加数据。注意这里返回的是 hash 表上 bucket 桶位置的地址。

其次是根据查找的位置，将新的 key 添加进去，方法见 dictInsertAtPosition。主要代码在下面。这块逻辑比较简单。

entry = zmalloc(sizeof(*entry) + metasize);
assert(entryIsNormal(entry)); /* Check alignment of allocation */
if (metasize > 0) {
    memset(dictEntryMetadata(entry), 0, metasize);
}
entry->key = key;
entry->next = *bucket;

扩容重哈希

Redis 字典的扩容分为 hash 数组的扩容和元素重 hash。数组的扩容的代码在下面

int _dictExpand(dict *d, unsigned long size, int* malloc_failed)
{
    //重新计算数组的长度
     signed char new_ht_size_exp = _dictNextExp(size);
     //为新数组分配内存空间
     new_ht_table = zcalloc(newsize*sizeof(dictEntry*));
     //设置重hash标志
     d->rehashidx = 0
}

为了保证字典扩容不影响业务处理，字典的重 hash 不是一步完成。通常在字典的其他操作中分多次顺带完成元素的移动。常常可以看到下面的代码。

if (dictIsRehashing(d)) _dictRehashStep(d);

static void _dictRehashStep(dict *d) {
     //这里的参数1表示，触发一次重 hash 只在数组的一个位置上元素重新移动
    if (d->pauserehash == 0) dictRehash(d,1);
}

函数 dictRehash(d,1) 是重 hash 的主要逻辑。字典属性 rehashidx 记录本次要移动的元素索引，此索引上的所有元素都会被移动。如果所有元素都移动完成，那么会将重新分配的hash表作为正在使用的hash表。

de = d->ht_table[0][d->rehashidx];
/* Move all the keys in this bucket from the old to the new hash HT */
while(de) {
    
}
/* Check if we already rehashed the whole table... */
if (d->ht_used[0] == 0) {
    zfree(d->ht_table[0]);
    /* Copy the new ht onto the old one */
    d->ht_table[0] = d->ht_table[1];
    d->ht_used[0] = d->ht_used[1];
    _dictReset(d, 1);
 }

总结

redis 字典数据对象应用广泛。
redis 字典设计尤其独到的一面，比如为了不阻塞业务端请求使用渐进式hash。