C++：哈希表_IT资讯网

C++：哈希表

Chris-zz 2024-09-14 16:35:01 阅读 65

✨✨✨学习的道路很枯燥，希望我们能并肩走下来!

文章目录

前言

一哈希概念

二哈希冲突

三哈希函数

四哈希冲突解决

4.1 闭散列

方法一：线性探测

方法二：二次探测

编辑4.2 开散列

4.3 开散列与闭散列比较

五模拟实现

5.1 哈希表的改造

5.1.1. 模板参数列表的改造

5.1.2. 增加迭代器操作

5.2 unordered_map

5.3 unordered_set

总

前言

本篇详细介绍了进一步介绍C++中的哈希表，让使用者对哈希表有更加深刻的认知，而不是仅仅停留在表面，更好的模拟，为了更好的使用. 文章可能出现错误，如有请在评论区指正，让我们一起交流，共同进步！

一哈希概念

顺序结构以及平衡树中，元素关键码与其存储位置之间没有对应的关系，因此在查找一个元素时，必须要经过关键码的多次比较。顺序查找时间复杂度为O(N)，平衡树中为树的高度，即 O(log_2 N)，搜索的效率取决于搜索过程中元素的比较次数。

理想的搜索方法：可以不经过任何比较，一次直接从表中得到要搜索的元素。如果构造一种存储结构，通过某种函数(hashFunc)使元素的存储位置与它的关键码之间能够建立一一映射的关系，那么在查找时通过该函数可以很快找到该元素。

当向该结构中：

◎ 插入元素

根据待插入元素的关键码，以此函数计算出该元素的存储位置并按此位置进行存放

◎ 搜索元素

对元素的关键码进行同样的计算，把求得的函数值当做元素的存储位置，在结构中按此位置取元素比较，若关键码相等，则搜索成功

该方式即为哈希(散列)方法，哈希方法中使用的转换函数称为哈希(散列)函数，构造出来的结构称为哈希表(Hash Table)(或者称散列表)

用该方法进行搜索不必进行多次关键码的比较，因此搜索的速度比较快

二哈希冲突

对于两个数据元素的关键字即：不同关键字通过相同哈希函数计算出相同的哈希地址，该种现象称为哈希冲突或哈希碰撞。

把具有不同关键码而具有相同哈希地址的数据元素称为“同义词”

发生哈希冲突该如何处理呢？

三哈希函数

引起哈希冲突的一个原因可能是：哈希函数设计不够合理。

哈希函数设计原则：

● 哈希函数的定义域必须包括需要存储的全部关键码，而如果散列表允许有m个地址时，其值域必须在0到m-1之间

● 哈希函数计算出来的地址能均匀分布在整个空间中

● 哈希函数应该比较简单

常见哈希函数：

1. 直接定址法--(常用)

取关键字的某个线性函数为散列地址：Hash（Key）= A*Key + B

优点：简单、均匀

缺点：需要事先知道关键字的分布情况，只适合范围相对集中的数据

使用场景：适合查找比较小且连续的情况

387. 字符串中的第一个唯一字符 - 力扣（LeetCode）

<code>class Solution {

public:

int firstUniqChar(string s) {

int count[26]={0};

for(auto n : s)

{

count[n-'a']++;

}

for(int i = 0;i<s.size();i++)

{

if(count[s[i]-'a']==1)

return i;

}

return -1;

}

};

2. 除留余数法--(常用)

设散列表中允许的地址数为m，取一个不大于m，但最接近或者等于m的质数p作为除数，按照哈希函数：Hash(key) = key% p(p<=m),将关键码转换成哈希地址

3. 平方取中法--(了解)

假设关键字为1234，对它平方就是1522756，抽取中间的3位227作为哈希地址；再比如关键字为4321，对它平方就是18671041，抽取中间的3位671(或710)作为哈希地址

平方取中法比较适合：不知道关键字的分布，而位数又不是很大的情况

4. 折叠法--(了解)

折叠法是将关键字从左到右分割成位数相等的几部分(最后一部分位数可以短些)，然后将这几部分叠加求和，并按散列表表长，取后几位作为散列地址。

折叠法适合事先不需要知道关键字的分布，适合关键字位数比较多的情况

5. 随机数法--(了解)

选择一个随机函数，取关键字的随机函数值为它的哈希地址，即H(key) = random(key),其中 random为随机数函数。

通常应用于关键字长度不等时采用此法

6. 数学分析法--(了解)

设有n个d位数，每一位可能有r种不同的符号，这r种不同的符号在各位上出现的频率不一定相同，可能在某些位上分布比较均匀，每种符号出现的机会均等，在某些位上分布不均匀只有某几种符号经常出现。可根据散列表的大小，选择其中各种符号分布均匀的若干位作为散列地址。例如：

注意：哈希函数设计的越精妙，产生哈希冲突的可能性就越低，但是无法避免哈希冲突

四哈希冲突解决

解决哈希冲突两种常见的方法是：闭散列和开散列

4.1 闭散列

闭散列：也叫开放定址法，当发生哈希冲突时，如果哈希表未被装满，说明在哈希表中必然还有空位置，那么可以把key存放到冲突位置中的“下一个” 空位置中去。那如何寻找下一个空位置呢？

方法一：线性探测

线性探测：从发生冲突的位置开始，依次向后探测，直到寻找到下一个空位置为止。

◎ 插入

◎ 删除

采用闭散列处理哈希冲突时，不能随便物理删除哈希表中已有的元素，若直接删除元素会影响其他元素的搜索。比如删除元素4，如果直接删除掉，44查找起来可能会受影响。因此线性探测采用标记的伪删除法来删除一个元素。

<code>// 哈希表每个空间给个标记

// EMPTY此位置空， EXIST此位置已经有元素， DELETE元素已经删除

enum State{EMPTY, EXIST, DELETE};

哈希表什么情况下进行扩容？如何扩容？

线性探测的实现

<code>#pragma once

#include<iostream>

#include<vector>

using namespace std;

namespace Close_Hash

{

template<class K>

struct HashFunc

{

size_t operator()(const K& key)

{

return (size_t)key;

}

};

template<>

struct HashFunc<string>

{

size_t operator()(const string& key)

{

size_t hash = 0;

for (auto e : key)

{

hash *= 31;

hash += e;

}

return hash;

}

};

enum State

{

EXIST,

EMPTY,

DELETE

};

template<class K, class V>

struct HashData

{

pair<K, V> _kv;

State _state = EMPTY;

};

template<class K, class V, class Hash = HashFunc<K>>

class HashTable

{

public:

HashTable()

{

_tables.resize(10);

}

bool Insert(const pair<K, V>& kv)

{

if (Find(kv.first))

return false;

if (_n * 10 / _tables.size() >= 7)

{

HashTable<K, V, Hash> newHT;

newHT._tables.resize(2 * _tables.size());

for (int i = 0; i < _tables.size(); i++)

{

if (_tables[i]._state == EXIST)

{

newHT.Insert(_tables[i]._kv);

}

_tables.swap(newHT._tables);

}

Hash hs;

size_t hashi = hs(kv.first) % _tables.size();

while (_tables[hashi]._state == EXIST)

{

hashi++;

hashi %= _tables.size();

}

_tables[hashi]._state = EXIST;

_tables[hashi]._kv = kv;

_n++;

}

HashData<K, V>* Find(const K& key)

{

Hash hs;

size_t hashi = hs(key) % _tables.size();

while (_tables[hashi]._state != EMPTY)

{

if (_tables[hashi]._state == EXIST&& _tables[hashi]._kv.first == key)

{

return &_tables[hashi];

}

++hashi;

hashi %= _tables.size();

}

return nullptr;

}

bool Erase(const K& key)

{

HashData<K, V>* ret = Find(key);

if (ret)

{

ret->_state = DELETE;

return true;

}

else

{

return false;

}

_n--;

}

private:

vector<HashData<K, V>> _tables;

size_t _n = 0; // 表中存储数据个数

};

void TestHT1()

{

HashTable<int, int> ht;

int a[] = { 11,21,4,14,24,15,9 };

for (auto e : a)

{

ht.Insert({ e,e });

}

ht.Insert({ 19,19 });

ht.Insert({ 19,190 });

ht.Insert({ 19,1900 });

ht.Insert({ 39,1900 });

cout << ht.Find(24) << endl;

ht.Erase(4);

cout << ht.Find(24) << endl;

cout << ht.Find(4) << endl;

}

线性探测优点：实现非常简单，

线性探测缺点：一旦发生哈希冲突，所有的冲突连在一起，容易产生数据“堆积”，即：不同关键码占据了可利用的空位置，使得寻找某关键码的位置需要许多次比较，导致搜索效率降低。如何缓解呢？

方法二：二次探测

线性探测的缺陷是产生冲突的数据堆积在一块，这与其找下一个空位置有关系，因为找空位置的方式就是挨着往后逐个去找，因此二次探测为了避免该问题，

找下一个空位置的方法为：二次探测相对线性探测的区别在于，在查找空位置时跳跃式的前后交替查找，每次的步长都是前一次的二次方

4.2 开散列

开散列法又叫哈希桶/拉链法，首先对关键码集合用散列函数计算散列地址，具有相同地址的关键码归于同一子集合，每一个子集合称为一个桶，各个桶中的元素通过一个单链表链接起来，各链表的头结点存储在哈希表中。

在哈希桶中，其每一个位置都是一个链表，原本会产生哈希冲突的、具有相同哈希地址的元素归并成一个集合，用链表链接起来，称为一个桶，并将各个链表的头节点存储在哈希表中

从上图可以看出，开散列中每个桶中放的都是发生哈希冲突的元素。

所以用开散列法插入元素和删除元素，按照链表的方式进行操作即可。需要注意的是，因为我们采用的是单链表，不好找尾，所以插入元素时应该头插

桶的个数是一定的，随着元素的不断插入，每个桶中元素的个数不断增多，极端情况下，可能会导致一个桶中链表节点非常多，会影响的哈希表的性能，因此在一定条件下需要对哈希表进行增容，那该条件怎么确认呢？开散列最好的情况是：每个哈希桶中刚好挂一个节点，再继续插入元素时，每一次都会发生哈希冲突，因此，在元素个数刚好等于桶的个数时，可以给哈希表增容。

哈希桶的负载因子可以开到1，也就是平均每个位置都有一个元素。当达到条件时，首先创建一个新表，重新将每个节点映射到新表中，然后销毁旧表。

开散列实现

<code>namespace hash_bucket // 哈希桶:数组加单链表

{

template<class K, class V>

struct HashNode

{

HashNode(const pair<K, V>& kv)

:_kv(kv)

,_next(nullptr)

{}

pair<K, V> _kv;

HashNode* _next;

};

template<class K>

struct HashMapping

{

size_t operator()(const K& key)

{

return (size_t)key;

}

};

template<>

struct HashMapping<string>

{

size_t operator()(const string& key)

{

// BKDR字符串哈希

size_t ret = 0;

for (auto i : key)

{

ret *= 31;

ret += i;

}

return ret;

}

};

template<class K, class V, class Hash = HashMapping<K>>

class HashTable

{

typedef HashNode<K, V> Node;

public:

HashTable()

{

_tables.resize(10);

}

~HashTable() // vector可以自己析构，但哈希桶需要我们手动析构

{

for (size_t i = 0; i < _tables.size(); i++)

{

Node* cur = _tables[i];

while (cur)

{

Node* next = cur->_next;

delete cur;

cur = next;

}

_tables[i] = nullptr;

}

bool Insert(const pair<K, V> kv)

{

if (Find(kv.first)) return false;

Hash hf;

if (_n == _tables.size()) // 负载因子为1时扩容

{

// 为了避免析构和重新创建节点的消耗，直接把旧表中的节点重新映射至新表

vector<Node*> new_tables;

new_tables.resize(_tables.size() * 2);

// 遍历旧表

for (size_t i = 0; i < _tables.size(); i++)

{

Node* cur = _tables[i];

while (cur)

{

Node* next = cur->_next; // 头插会改变节点的指向，所以需要保存next

// 计算新的映射位置

size_t new_hashi = hf(cur->_kv.first) % new_tables.size();

cur->_next = new_tables[new_hashi];

new_tables[new_hashi] = cur;

cur = next;

}

_tables[i] = nullptr;

}

_tables.swap(new_tables);

}

size_t hashi = hf(kv.first) % _tables.size();

Node* newnode = new Node(kv);

// 哈希桶头插

newnode->_next = _tables[hashi];

_tables[hashi] = newnode;

_n++;

return true;

}

bool Erase(const K& key)

{

Hash hf;

size_t hashi = hf(key) % _tables.size();

Node* prev = nullptr; // 单链表无法找头，需要提前保存

Node* cur = _tables[hashi];

while (cur)

{

if (cur->_kv.first == key)

{

if (prev == nullptr) // 待删除节点在头部

_tables[hashi] = cur->_next;

else

prev->_next = cur->_next;

delete cur;

return true;

}

prev = cur;

cur = cur->_next;

}

return false;

}

Node* Find(const K& key)

{

Hash hf;

size_t hashi = hf(key) % _tables.size();

Node* cur = _tables[hashi];

while (cur)

{

if (cur->_kv.first == key)

return cur;

cur = cur->_next;

}

return nullptr;

}

void Print()

{

for (int i = 0; i < _tables.size(); i++)

{

cout << i << " : ";

Node* cur = _tables[i];

while (cur)

{

cout << "[" << cur->_kv.first << "," << cur->_kv.second << "]" << "->";

cur = cur->_next;

}

cout << endl;

}

cout << endl;

}

size_t size()

{

return _n;

}

private:

vector<Node*> _tables;

size_t _n = 0;

};

}

4.3 开散列与闭散列比较

应用哈希桶法处理溢出，需要增设链接指针，似乎增加了存储开销。事实上：由于开地址法必须保持大量的空闲空间以确保搜索效率，如二次探查法要求装载因子a <= 0.7，而表项所占空间又比指针大的多，所以使用链地址法反而比开地址法节省存储空间。

五模拟实现

5.1 哈希表的改造

5.1.1. 模板参数列表的改造

// K:关键码类型

// V: 不同容器V的类型不同，如果是unordered_map，V代表一个键值对，如果是

unordered_set,V 为 K

// KeyOfValue: 因为V的类型不同，通过value取key的方式就不同，详细见

unordered_map/set的实现

// HF: 哈希函数仿函数对象类型，哈希函数使用除留余数法，需要将Key转换为整形数字才能

取模

template<class K, class V, class KeyOfValue, class HF = DefHashF<T> >

class HashBucket;

5.1.2. 增加迭代器操作

#pragma once

#include<iostream>

#include<vector>

using namespace std;

template<class K>

struct HashFunc

{

size_t operator()(const K& key)

{

return (size_t)key;

}

};

// 特化

template<>

struct HashFunc<string>

{

size_t operator()(const string& key)

{

size_t hash = 0;

for (auto e : key)

{

hash *= 31;

hash += e;

}

return hash;

}

};

namespace hash_bucket

{

template<class T>

struct HashNode

{

T _data;

HashNode<K, V>* _next;

HashNode(const T& data)

:_data(data)

, _next(nullptr)

{}

};

// 前置声明

template<class K, class T, class KeyOfT, class Hash>

class HashTable;

template<class K, class T, class Ptr, class Ref, class KeyOfT, class Hash>

struct HTIterator

{

typedef HashNode<T> Node;

typedef HTIterator<K, T, Ptr, Ref, KeyOfT, Hash> Self;

Node* _node;

const HashTable<K, T, KeyOfT, Hash>* _pht;

HTIterator(Node* node, const HashTable<K, T, KeyOfT, Hash>* pht)

:_node(node)

, _pht(pht)

{}

Ref operator*()

{

return _node->_data;

}

Ptr operator->()

{

return &_node->_data;

}

bool operator!=(const Self& s)

{

return _node != s._node;

}

Self& operator++()

{

if (_node->_next)

{

// 当前桶还有节点

_node = _node->_next;

}

else

{

// 当前桶走完了，找下一个不为空的桶

KeyOfT kot;

Hash hs;

size_t hashi = hs(kot(_node->_data)) % _pht->_tables.size();

++hashi;

while (hashi < _pht->_tables.size())

{

if (_pht->_tables[hashi])

{

break;

}

++hashi;

}

if (hashi == _pht->_tables.size())

{

_node = nullptr; // end()

}

else

{

_node = _pht->_tables[hashi];

}

return *this;

}

};

template<class K, class T, class KeyOfT, class Hash = HashFunc<K>>

class HashTable

{

// 友元声明

template<class K, class T, class Ptr, class Ref, class KeyOfT, class Hash>

friend struct HTIterator;

typedef HashNode<T> Node;

public:

typedef HTIterator<K, T, T*, T&, KeyOfT, Hash> Iterator;

typedef HTIterator<K, T, const T*, const T&, KeyOfT, Hash> ConstIterator;

Iterator Begin()

{

if (_n == 0)

return End();

for (size_t i = 0; i < _tables.size(); i++)

{

Node* cur = _tables[i];

if (cur)

{

return Iterator(cur, this);

}

return End();

}

Iterator End()

{

return Iterator(nullptr, this);

}

ConstIterator Begin() const

{

if (_n == 0)

return End();

for (size_t i = 0; i < _tables.size(); i++)

{

Node* cur = _tables[i];

if (cur)

{

return ConstIterator(cur, this);

}

return End();

}

ConstIterator End() const

{

return ConstIterator(nullptr, this);

}

HashTable()

{

_tables.resize(10, nullptr);

}

~HashTable()

{

// 依次把每个桶释放

for (size_t i = 0; i < _tables.size(); i++)

{

Node* cur = _tables[i];

while (cur)

{

Node* next = cur->_next;

delete cur;

cur = next;

}

_tables[i] = nullptr;

}

pair<Iterator, bool> Insert(const T& data)

{

KeyOfT kot;

Iterator it = Find(kot(data));

if (it != End())

return make_pair(it, false);

Hash hs;

size_t hashi = hs(kot(data)) % _tables.size();

// 负载因子==1扩容

if (_n == _tables.size())

{

vector<Node*> newtables(_tables.size() * 2, nullptr);

for (size_t i = 0; i < _tables.size(); i++)

{

Node* cur = _tables[i];

while (cur)

{

Node* next = cur->_next;

// 旧表中节点，挪动新表重新映射的位置

size_t hashi = hs(cur->_kv.first) % newtables.size();

// 头插到新表

cur->_next = newtables[hashi];

newtables[hashi] = cur;

cur = next;

}

_tables[i] = nullptr;

}

_tables.swap(newtables);

}

// 头插

Node* newnode = new Node(data);

newnode->_next = _tables[hashi];

_tables[hashi] = newnode;

++_n;

return make_pair(Iterator(newnode, this), true);

}

Iterator Find(const K& key)

{

Hash hs;

KeyOfT kot;

size_t hashi = hs(key) % _tables.size();

Node* cur = _tables[hashi];

while (cur)

{

if (kot(cur->_data) == key)

{

return Iterator(cur, this);

}

cur = cur->_next;

}

return End();

}

bool Erase(const K& key)

{

Hash hs;

KeyOfT kot;

size_t hashi = hs(key) % _tables.size();

Node* prev = nullptr;

Node* cur = _tables[hashi];

while (cur)

{

if ((cur->_data) == key)

{

if (prev == nullptr)

{

_tables[hashi] = cur->_next;

}

else

{

prev->_next = cur->_next;

}

delete cur;

--_n;

return true;

}

prev = cur;

cur = cur->_next;

}

return false;

}

private:

vector<Node*> _tables; // 指针数组

size_t _n = 0;// 表中存储数据个数

};

}

5.2 unordered_map

#pragma once

#include "HashTable2.h"

namespace ch

{

template<class K, class V, class Hash = HashFunc<K>>

class unordered_map

{

struct MapKeyOfT

{

const K& operator()(const pair<K, V>& kv)

{

return kv.first;

}

};

public:

typedef typename hash_bucket::HashTable<K, pair<const K, V>, MapKeyOfT, Hash>::Iterator iterator;

typedef typename hash_bucket::HashTable<K, pair<const K, V>, MapKeyOfT, Hash>::ConstIterator const_iterator;

iterator begin()

{

return _ht.Begin();

}

iterator end()

{

return _ht.End();

}

const_iterator begin() const

{

return _ht.Begin();

}

const_iterator end() const

{

return _ht.End();

}

pair<iterator, bool> insert(const pair<K, V>& kv)

{

return _ht.Insert(kv);

}

V& operator[](const K& key)

{

pair<iterator, bool> ret = _ht.Insert(make_pair(key, V()));

return ret.first->second;

}

iterator Find(const K& key)

{

return _ht.Find(key);

}

bool Erase(const K& key)

{

return _ht.Erase(key);

}

private:

hash_bucket::HashTable<K, pair<const K, V>, MapKeyOfT, Hash> _ht;

};

}

5.3 unordered_set

#pragma once

#include "HashTable2.h"

namespace ch

{

template<class K, class Hash = HashFunc<K>>

class unordered_set

{

struct SetKeyOfT

{

const K& operator()(const K& key)

{

return key;

}

};

public:

typedef typename hash_bucket::HashTable<K, const K, SetKeyOfT, Hash>::Iterator iterator;

typedef typename hash_bucket::HashTable<K, const K, SetKeyOfT, Hash>::ConstIterator const_iterator;

iterator begin()

{

return _ht.Begin();

}

iterator end()

{

return _ht.End();

}

const_iterator begin() const

{

return _ht.Begin();

}

const_iterator end() const

{

return _ht.End();

}

pair<iterator, bool> insert(const K& key)

{

return _ht.Insert(key);

}

iterator Find(const K& key)

{

return _ht.Find(key);

}

bool Erase(const K& key)

{

return _ht.Erase(key);

}

private:

hash_bucket::HashTable<K, const K, SetKeyOfT, Hash> _ht;

};

}

总结

✨✨✨各位读友，本篇分享到内容是否更好的让你理解哈希，如果对你有帮助给个👍赞鼓励一下吧！！

🎉🎉🎉世上没有绝望的处境，只有对处境绝望的人。

感谢每一位一起走到这的伙伴，我们可以一起交流进步！！！一起加油吧！！

上一篇：十一，Spring Boot 当中配置拦截器的“两”种方式

下一篇：模板（C++）

本文标签

C++：哈希表

声明

本文内容仅代表作者观点，或转载于其他网站，本站不以此文作为商业用途
如有涉及侵权，请联系本站进行删除
转载本站原创文章，请注明来源及作者。

C++：哈希表

文章目录

前言

一哈希概念

二哈希冲突

三哈希函数

四哈希冲突解决

4.1 闭散列

方法一：线性探测

方法二：二次探测

4.2 开散列

4.3 开散列与闭散列比较

五模拟实现

5.1 哈希表的改造

5.1.1. 模板参数列表的改造

5.1.2. 增加迭代器操作

5.2 unordered_map

5.3 unordered_set

总结

本文标签

声明

相关文章

阅读排行

热门文章

C++：哈希表

文章目录

前言

一 哈希概念

二 哈希冲突

三 哈希函数

四 哈希冲突解决

4.1 闭散列

方法一：线性探测

方法二：二次探测

4.2 开散列

4.3 开散列与闭散列比较

五 模拟实现

5.1 哈希表的改造

5.1.1. 模板参数列表的改造

5.1.2. 增加迭代器操作

5.2 unordered_map

5.3 unordered_set

总结

本文标签

声明

相关文章

阅读排行

热门文章

一哈希概念

二哈希冲突

三哈希函数

四哈希冲突解决

五模拟实现