2-3 Search Tree

Structure

A 2-3 search tree follows the basic construction of a tree, with a minor difference being that it has two types of nodes:

2-node: a node with two children, so it is like an ordinary BST node with one key and one associated value;
3-node: a node with three children, the node itself therefore has two keys as pivots and two associated values.

This is what a 2-3 search tree would typically look like. Here numbers are used as keys to show a more explicit ordering.
请添加图片描述

Having a new type of node does not change the overall property of being a tree: the child keys are partitioned by the parent keys, and in-order traversal always yields the elements in ascending order.

The difference that is really made by the new 3-node, and the associated methods, is that in a 2-3 tree, all paths from the root to a null(end) node has the same length. That is, the height of the tree from all leaves is the same, which guarantees the logarithmic performance. Next we shall see how this is achieved.

Associated Methods

Get(Search)

Search in a tree is simple, and follows the exact same process as in a BST - recursively compare the key with the key(s) in the node, and find the interval to traverse down the tree, until the node is found or a null node is reached. Nothing fancy here.

Put(Insert)

The real magic takes place at insertion, that is, growing a 2-3 search tree. Back when we had a BST, the new element is always intuitively inserted at a leaf node. That is, the tree always grows from the root down the tree. In a 2-3 search tree, however, things are different.

Given an existing 2-3 search tree, the insertion of the new element always takes place at a node with null links - a leaf node, everything looks just like an ordinary BST up to this point.

However, instead of directly adding a new children to the leaf node, as a BST would do, in a 2-3 search tree, the new element is always inserted directly into the leaf node - given a 2-3 tree having two types of nodes, here are, correspondingly, two possibilities for the node:

2-node: then the leaf is transformed directly into a 3-node, by inserting the new element into this node, whose key is now the original key & the new key. Below is an example of inserting a node with key 65 into the above 2-3 search tree, note that the tree size is not increased by this operation:
3-node: things are a little more complicated when leaf is a 3-node, because certainly we cannot grow it to a 4-node, so we do the following:
1- temporarily grow the 3-node into a 4-node, to find the ordering among the new key and the existing two keys;
2- put(insert) the middle key of the 4-node up into its parent node, divide the remaining into two 2-nodes, and repeat either case for the parent node; and if the root node is already a 3-node, divide that into three 2-nodes.
Below is a process showing the insertion of a key 66 into the previous example tree:

And here is the process of inserting a key 4 into the example tree:

From the above two examples, you can actually see that although the size of the tree grows often, the situation where the height of the tree grows is fairly rare (all the nodes in the path are 3-nodes!). Even when it does grow, it grows at the root node instead of a leaf, thereby maintaining the property that all paths from node to leaf have the same length.

Analysis

For a get operation, the best case is when every node has exactly 3 links, so that the total height is log3(N); the worst case is when every node has exactly 2 links, so the height is log(N). Either way, the cost for a get operation is proportional to the height of the tree, hence, O(log(N)).

For a put operation, after you’ve reached the desired leaf node after c * log(N) traversals, the best case is of course, when the node to be inserted to is a 2-node, so the number of insertion operation is 1; the worst case is described in the last example, when you have to travel all the way back to the root node, with splitting 4-nodes in every layer, which is, a constant-cost performance as it is a local operation - the splitting of one node only affects one other layer. Hence, the overall cost is O(log(N)) as well.

Red-Black Binary Search Tree

Although the 2-3 search tree sounds wonderful in theory, notice that we have not mentioned any pseudocode for implementation, and the reason is that having two separate data structures - a twoNode and a threeNode is just too inefficient, and actually slows down things quite a lot. So here we have the red-black BST, a practical version of the 2-3 search tree.

Structure

To save the mess of having two data types at the same time, in a 2-3 tree we’ll only have one type: a 2-node. However, we still want to represent the hypothetical difference between a 2-node and a 3-node, so we associate an additional piece of information to a node: a color. For mysteriously historical reasons, red and black are chosen as the two colors to represent the different node types. The color of a node is used to denote the link that connects to the node. If a node is connected by a red link, then we say the node is red; otherwise the node is black.

Below is the 2-3 tree from the first example, being translated into its red-black representation. Here I have eliminated the blue colours that fill the nodes to have a clearer illustration of the red-black property:
请添加图片描述
With only 2-nodes, this red-black tree can be treated as a left-leaning BST, where its nodes lean towards left. Moreover, the 3-nodes are implicitly represented as internal red links:

As mentioned before, red-black trees are practical implementations of 2-3 search trees, so it is worth examining the properties for 2-3 search trees in red-black trees:

all paths from leaf to root have the same number of black links;
no node has two red child nodes (no 4-nodes);
for symmetry, all red links lean left (all red nodes are left children).

Next we see how the operations described for 2-3 search trees get realised in red-black trees.

Associated Methods

Get(Search)

Since red-black trees take the same form as BSTs, the accessing of elements is exactly the same as accessing a BST, disregarding the color at all. The only difference is performance, because red-black trees are a balanced BST, so the number of layers is logarithmic, and the worst case is simply having to traverse through an additional red link at every layer. Hence, the overall performance is guaranteed logarithmic.

Auxiliary methods - Left Rotation

Before directly translating insert into a red-black implementation, there are a few auxiliary methods that will be needed, and should be looked at first.

The first is left rotation: This is when the right-hand node in a hypothetical 3-node is red and the left-hand black (i.e. the red link is right-leaning). Hence, rotateLeft can be considered as making x the BST-parent of n.
请添加图片描述
So we rotate the 3-node to make the red link left-leaning as follows:

reassign the middle child:
swap the parent role:
“fix” the coloring:

Before moving on, there is one more thing that is worth mentioning.

Take a closer look at the fourth line in rotateLeft(). it assigns the color of n to x. What does this mean? Why not directly assign BLACK to x.color?

What does this mean? This means that when n is BLACK, as demonstrated by this diagram, x also becomes BLACK; BUT when n is RED, x also becomes RED!

We know when there is need to rotate left when n is black - when its left child is NOT red AND right child is red. In this case, rotating left is simply transforming the right-leaning red link to left-leaning.

But what does n being red mean? n being red means that it was originally part of a legal 3-node, AND that it is the left child of its parent. But why would you need to rotate part of a 3-node to its left? Still, only when the left child of n is NOT red AND the right child of n is red! In this case, rotating left is not simply changing the direction of the red link, it is growing an illegal 4-node (n's original parent, n itself and x ~ n's right child) temporarily to a legal 4-node (n's original parent, x ~ n's new parent and n itself, the three are all left-linked)!

Auxiliary methods - Right Rotation

Right rotation follows exactly the same steps as left rotation, the idea is the same: making x the BST-parent of n. The only differences are the ordering of reassigning of the middle child and swapping the parent roles. The process is given as follows:

def rotateRight(n):
	x = n.left
	n.left = x.right
	x.right = n
	x.color = n.color
	n.color = RED

Although we’ve stated that the red-black tree should be left-leaning, which makes rotateRight() seemingly useless, note that both rotateLeft() and rotateRight() handles only one red link, and does not mess with the color of other links. Moreover, rotateRight() is only used when there is a legal 4-node, resulted from either direct insertion or a previous rotateLeft().

Auxiliary methods - Color Flip

Although the name of this method is called color flip, it is actually not doing much related to flipping: this function actually disassembles an illegal 4-node into three 2-nodes.
请添加图片描述

Auxiliary Methods Recap

Note that in all three auxiliary methods, there is always a return statement at the end, just like the BST methods. This means that in red-black tree, we also define methods in a recursive manner - whenever we call a method, we are trying to rebuild the existing tree by directly returning a subtree.

Put(Insert)

Before we get into the details for inserting in a red-black tree, recall the insert method for an ordinary BST in the last chapter:

def putBST(newNode, node):
	if node == None:
		return newNode
	if node.key > newNode.key:
		node.left = putBST(newNode, node.left)
		return node
	else:
		node.right = putBST(newNode, node.right)
		return node

Now let’s first look at the python code and explain the implementation details. Bear in mind that this put() function is just the implemented version of the put described in 2-3 search trees, so you can refer to that section to help understand a little:

def put(newNode, node):
	if node == None:
		newNode.color = RED # grow the node to be inserted by creating a new "internal" red link
		# if the node was originally 2-node, it is now a 3-node
		# if originally a 3-node, now a 4-node (temporarily)
		return newNode
		
	if node.key > newNode.key:
		node.left = put(newNode, node.left)
	else:
		node.right = put(newNode, node.right)

	if isRed(node.right) and not isRed(node.left): # only right child is red
		node = rotateLeft(node)
	if isRed(node.left) and isRed(node.left.left):
		node = rotateRight(node)
	if isRed(node.left) and isRed(node.right):
		node = colorFlip(node)

	return node

Explanation:
Compare put() with putBST(). The most prominent difference is that, in the if-else block that chooses the correct branch, the return statement that was originally in the if-else block in putBST() gets placed outside of the block, following another sequence of ifs. And it is this sequence of if’s that makes the difference in structure of the trees. It does a sequence of local manipulation on the sub-tree, before returning it to concatenate to the larger tree.

Now let’s look more closer at the manipulation sequence. We get to write the handling of various situations in such a compact sequence of if's is due to the recursive nature - that we don’t need to fix everything in one go, we just need to fix the current node 's children.

A minor issue to note is that when both children of the root node are red after some operations, according to the code above + the rules for a red-black tree, an additional call of flipColor() is needed. However, this could result in the root node itself being colored red.