最近在优化代码,发现有些场景下使用HashMap效率特别低下,深入研究源码找到问题根源,以文记之。
HashMap的数据结构:数组+链表或者红黑树,大概长这样
一般使用HashMap的时候都是直接进行如下操作
HashMap map = new HashMap();
即没有指定任何初始化参数,那么在底层,jvm是怎么做的呢,源码描述如下
/**
* Constructs an empty <tt>HashMap</tt> with the default initial capacity
* (16) and the default load factor (0.75).
*/
public HashMap() {
this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}
注释的意思是说使用默认的初始容量**(16)**初始化一个空的HashMap对象,并且指定默认的负载因子为0.75。
提到负载因子,则要先介绍一下HashMap内部定义的几个常量
1、初始容量,初始容量为16,而且必须为2的n次方
/**
* The default initial capacity - MUST be a power of two.
*/
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
2、最大容量,2的30次方,
/**
* The maximum capacity, used if a higher value is implicitly specified
* by either of the constructors with arguments.
* MUST be a power of two <= 1<<30.
*/
static final int MAXIMUM_CAPACITY = 1 << 30;
3、默认负载因子,默认为0.75,就是实际容量和当前最大容量的比值,这个值决定了扩容阈值的大小,比如默认容量是16,当map中已经put了16*0.75=12个元素时,map就需要扩容了
/**
* The load factor used when none specified in constructor.
*/
static final float DEFAULT_LOAD_FACTOR = 0.75f;
4、转换成树阈值,什么意思呢,HashMap的组成是数组加链表,也就是说在某一个数组下标下如果链表的长度到了8,就要转换成树
/**
* The bin count threshold for using a tree rather than list for a
* bin. Bins are converted to trees when adding an element to a
* bin with at least this many nodes. The value must be greater
* than 2 and should be at least 8 to mesh with assumptions in
* tree removal about conversion back to plain bins upon
* shrinkage.
*/
static final int TREEIFY_THRESHOLD = 8;
5、树转链表的阈值,即当某个数组下标下的树元素个数小于等于6时,会将树转换成链表
/**
* The bin count threshold for untreeifying a (split) bin during a
* resize operation. Should be less than TREEIFY_THRESHOLD, and at
* most 6 to mesh with shrinkage detection under removal.
*/
static final int UNTREEIFY_THRESHOLD = 6;
6、链表转树时,table的最小容量,其实这个是为了平衡扩容和转换成树之间的矛盾吧
/**
* The smallest table capacity for which bins may be treeified.
* (Otherwise the table is resized if too many nodes in a bin.)
* Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
* between resizing and treeification thresholds.
* 即 当哈希表中的容量 > 该值时,才允许树形化链表 (即 将链表 转换成红黑树)
* 否则,若桶内元素太多时,则直接扩容,而不是树形化
* 为了避免进行扩容、树形化选择的冲突,这个值不能小于 4 * TREEIFY_THRESHOLD
*/
static final int MIN_TREEIFY_CAPACITY = 64;
除了上面几个有值的常量,还有两个需要知道的变量
1、阈值,就是需要resize的临界值,它等于(容量*负载因子)
/**
* The next size value at which to resize (capacity * load factor).
*
* @serial
*/
// (The javadoc description is true upon serialization.
// Additionally, if the table array has not been allocated, this
// field holds the initial array capacity, or zero signifying
// DEFAULT_INITIAL_CAPACITY.)
int threshold;
2、负载因子,默认的是0.75,
/**
* The load factor for the hash table.
*
* @serial
*/
final float loadFactor;
3、map元素个数,被transient 修饰,说明不能被序列化
/**
* The number of key-value mappings contained in this map.
*/
transient int size;
4、哈希数组,就是常说的容量大小的那个数组,最上面那个图的黄色部分
/**
* The table, initialized on first use, and resized as
* necessary. When allocated, length is always a power of two.
* (We also tolerate length zero in some operations to allow
* bootstrapping mechanics that are currently not needed.)
*/
transient Node<K,V>[] table;
了解了常数和重要的变量后,查询来看看无参构造函数
/**
* Constructs an empty <tt>HashMap</tt> with the default initial capacity
* (16) and the default load factor (0.75).
*/
public HashMap() {
this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}
既然描述中是使用默认的容量(16)和默认的负载因子去构造一个空的HashMap对象,那也就是内部会调用有两个参数的构造方法,如下
/**
* Constructs an empty <tt>HashMap</tt> with the specified initial
* capacity and load factor.
*
* @param initialCapacity the initial capacity
* @param loadFactor the load factor
* @throws IllegalArgumentException if the initial capacity is negative
* or the load factor is nonpositive
*/
public HashMap(int initialCapacity, float loadFactor) {
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal initial capacity: " +
initialCapacity);
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
if (loadFactor <= 0 || Float.isNaN(loadFactor))
throw new IllegalArgumentException("Illegal load factor: " +
loadFactor);
this.loadFactor = loadFactor;
this.threshold = tableSizeFor(initialCapacity);
}
在我们用new HashMap()
时,这时候的initialCapacity
为16,loadFactor
为0.75,换句话说就算我们new HashMap()
操作时,jvm给我们返回了一个初始容量大小为16、负载因子为0.75的HashMap对象的引用。
总结:里面隐含一个面试经常问的点:不指定初始容量是,HashMap的初始大小是多少?