Skip to Content

AVL Trees

With the binary search tree we tried to implement a dictionary that allows us to search, insert and remove elements in \(O(\log n)\) time. We saw that we can implement these operations in \(O(h)\) time where \(h\) is the height of the tree. However, the height of the tree is dependent on the order in which the elements are added and that in the worst case the height of the tree can be \(n\) which would make the operations \(O(n)\). So we need to find a way to keep a tree “balanced” on both sides of the root which would lead to a height and time complexity of \(O(\log n)\).

An initial idea is to keep the tree complete with the Completeness property, so that all levels are filled except possibly for the last level which is filled from left to right. This would lead to a height of \(\log n\) and a time complexity of \(O(\log n)\). However if we want to maintain this property with the Search-tree property we quickly notice that for each set of \(n\) elements there is only one tree that fullfills both properties. And therefore the work to maintain this property is too high in \(O(n)\) time.

Unique complete binary search tree for a set of $n$ elements.
Unique complete binary search tree for a set of $n$ elements.

So we need to come up with a property that is less strict to be able to say a tree is balanced and have a time complexity of \(O(\log n)\). This is where the AVL tree comes in. The AVL Tree, named after its inventors Adelson-Velsky and Landis, is a self-balancing binary search tree. It is a binary search tree that satisfies the so called AVL Property:

For every node \(x\) in the tree, the difference in height between the left and right subtree of \(x\) is at most 1.

So if we use the following notations:

  • \(T_l(v)\): the left subtree of node \(v\).
  • \(T_r(v)\): the right subtree of node \(v\).
  • \(h_l(v)\): the height of the left subtree of node \(v\). But because we are talking about the height of the subtree and need to account for leaf nodes we increase the height by 1. So if \(v\) is a leaf node then \(h_l(v) = 0\) and otherwise \(h_l(v) = 1 +\) the height of the left subtree of \(v\).
  • \(h_r(v)\): the height of the right subtree of node \(v\) defined in the same way as \(h_l(v)\).
  • \(bal(v)\): the balance factor of node \(v\) so \(bal(v) = h_l(v) - h_r(v)\).
Height of the left and right subtree of a node $v$.
Height of the left and right subtree of a node $v$.

Then the we can say that a tree is balanced and satisfies the AVL property if:

\(|h_l(v) - h_r(v)| \leq 1\) or \(|bal(v)| \leq 1\) for all nodes \(v\) in the tree.

Comparison of balanced and unbalanced binary search trees.
Comparison of balanced and unbalanced binary search trees.

AVL Tree height

Let’s first check that this property actually leads to a height of \(O(\log n)\). We can prove this by induction. We can use the fibonacci sequence and some of its properties to prove this. Specifically if \(T(h)\) is the minimum number of nodes in an AVL tree of height \(h\) then we can say that:

An AVL tree of height \(h\) has at least \(Fib(h+1) - 1\) nodes, so \(T(h) \geq Fib(h+1) - 1\).

Where \(Fib(h)\) is the \(h\)-th fibonacci number and defined as:

\[\begin{align*} Fib(1) &= 1 \\ Fib(2) &= 1 \\ Fib(n) &= Fib(n-1) + Fib(n-2) \text{ for } n \geq 2 \end{align*} \]
Proof

We want to show that the minimum number of nodes in an AVL tree of height \(h\) is \(Fib(h+1) - 1\). So \(T(h) \geq Fib(h+1) - 1\). Because the fibonacci sequence uses the previous 2 numbers we need to use strong induction.

  1. Base Case: A tree of height 0 is just a leaf node so \(T(0) = 1\) and \(Fib(0+1) - 1 = 0\). The minimum number of nodes in an AVL tree of height 1 is the root node and 1 leaf node so \(T(1) = 2\) and \(Fib(1+1) - 1 = 0\). So the base case holds.
  2. Inductive Hypothesis: We assume that the statement holds for all \(h \leq k\).
  3. Inductive Step: We want to show that the statement holds for \(h = k+1\). Because of the AVL property we know that the trees left and right subtree have a height difference of at most 1. So the minimum number of nodes in an AVL tree of height \(k+1\) can be defined as:
\[\begin{align*} T(k+1) &= T(k) + T(k-1) + 1 \\ &\overset{I.H}{\geq} Fib(k+1) - 1 + Fib(k) - 1 + 1 \\ &= Fib(k+2) - 1 \end{align*} \]

So the statement holds for \(h = k+1\) and therefore for all \(h\).

So we can see that the minimum number of nodes in an AVL tree of height \(h\) is \(Fib(h+1) - 1\). Because the fibonacci sequence grows exponentially we can say that the height of the tree is \(O(\log n)\) where \(n\) is the number of nodes in the tree.

Maintaining a Balanced Tree

We have seen that the AVL property leads to a tree of height of \(O(\log n)\) which means that all operations should run in \(O(\log n)\) time. However, we still need to show that we can maintain this property when we insert or remove nodes from the tree and that we can do this in \(O(\log n)\) time.

We will insert and delete nodes in the same way as with a binary search tree. However, after inserting or deleting a node we have to check if the AVL property is still satisfied. Let’s first think about which heights and therefore also balance factors can change when we insert or delete a node. We quickly see that only the ancestors of the inserted or deleted node can have their heights changed. So we only need to update the heights and balance factors of the ancestors.

To keep a tree balanced we can just look at the different scenarios of when a node is inserted into the left subtree of a node. As in the end we focus on the balance factor and whether we are inserting or deleting a node doesn’t matter as the height change is the same. The only difference between doing it in the left or right subtree is the sign of the balance factor and the direction of the rotation we have to perform.

There are two possible scenarios when a node is inserted into the left subtree of a node \(u\):

  1. The height of the left subtree \(T_l(u)\) does not change. In this case \(h_l(u)\) does not change and therefore also not the balance factor \(bal(u)\) and the heights and balance factors of the ancestors.
  2. The height of the left subtree \(T_l(u)\) increases by 1 as we are only focusing on inserting nodes into the left subtree. We know have to make three further cases depending on \(h_r(u)\), the height of the right subtree \(T_r(u)\):

The first subcase is when \(h_r(u)\) was one node higher than \(h_l(u)\) before the insertion so now after increasing \(h_l(u)\) by 1 the two subtrees have the same height \(h_l(u) = h_r(u)\). In this case the balance factor changes from -1 to 0. However, this does not break the AVL property so we do not need to do anything. And because the height of \(u\) does not change we do not need to update the heights and balance factors of the ancestors.

The second subcase is when \(h_r(u)\) and \(h_l(u)\) were the same before the insertion so now after increasing \(h_l(u)\) by 1 the left subtree is one node higher than the right subtree \(h_l(u) = h_r(u) + 1\). In this case the balance factor changes from 0 to 1. This again does not break the AVL property so we do not need to do anything. However, the height of \(u\) changes so we have to update the balance factor of \(u\) and the heights and balance factors of the ancestors. If one of the ancestors becomes unbalanced we have one of the scenarios below which can be solved with a rotation.

The third subcase is when \(h_r(u)\) was before the insertion one node shorter than \(h_l(u)\) so now after increasing \(h_l(u)\) by 1 the left subtree is two nodes higher than the right subtree \(h_l(u) = h_r(u) + 2\). In this case the balance factor changes from 1 to 2. This breaks the AVL property so we have to perform a rotation to rebalance the tree.

The rotation depends on the following. If the inserted node is in the left subtree of the left child of \(u\) then we perform a single rotation. To be more precise we perform a right rotation on \(u\). This rotation means that we replace \(u\) and make \(u\) the right child of it. In the image below \(u\) is the node with the key 1 and the inserted node is the node with the key 3.

A right rotation on node $u$ to rebalance the tree.
A right rotation on node $u$ to rebalance the tree.

If we were looking at the situation where the inserted node was in the right subtree of \(u\) not the left subtree as we have restricted ourselves so far then just the direction of the rotation would change meaning that we would perform a left rotation on \(u\) instead of a right rotation.

A left rotation on node $u$ to rebalance the tree.
A left rotation on node $u$ to rebalance the tree.

Importantly the subtrees of the nodes that are rotated are also reattached in a way so that the binary search tree property is maintained. If the inserted node is in the right subtree of the left child of \(u\) then it gets a bit more complicated. Just doing a single rotation no matter if it is a left or right rotation does not work. We have to perform a double rotation. If we are focusing on the situation again where the node was inserted into the left subtree of \(u\) then we first perform a left rotation on the left child of \(u\) and then a right rotation on \(u\). In the image below \(u\) is the node with the key 3 and the inserted node is the node with the key 2. The first left rotation recreates the situation above that we can then solve with a right rotation.

A left-right rotation on node $u$ to rebalance the tree.
A left-right rotation on node $u$ to rebalance the tree.

Again if we were looking at the situation where the inserted node was in the right subtree of \(u\) not the left subtree as we have restricted ourselves so far then just the direction of the rotations would change meaning that we would first perform a right rotation on the left child of \(u\) and then a left rotation on \(u\).

A right-left rotation on node $u$ to rebalance the tree.
A right-left rotation on node $u$ to rebalance the tree.

After performing a rotation or double rotation we are finished and no longer need to update the heights and balance factors of the ancestors when inserting. However, when removing a node we might not be finished after performing a rotation. We still have to update the heights and balance factors of the ancestors and check if the AVL property is still satisfied until we reach the root of the tree.

We can summarize these rotations based on the balance factor of the node \(u\) and the balance factor of the left or right children as follows where \(bf(v)\) is the balance factor of node \(v\) where \(bf(v) = h_l(v) - h_r(v)\) and \(bf(v.left)\) is the balance factor of the left child of node \(v\) and \(bf(v.right)\) is the balance factor of the right child of node \(v\):

RotationConditionSigns of the balance factors
Right\(bf(v) > 1\) and \(bf(v.left) \geq 0\)same, both positive
Left-Right\(bf(v) > 1\) and \(bf(v.left) < 0\)different
Left\(bf(v) < -1\) and \(bf(v.right) \leq 0\)same, both negative
Right-Left\(bf(v) < -1\) and \(bf(v.right) > 0\)different

So we see that we can maintain the AVL property when inserting or deleting a node in \(O(\log n)\) time. The worst case scenario is when we remove a node and have to repeatedly perform a rotation and update the heights and balance factors of the ancestors until we reach the root of the tree. But because the height of the tree is \(O(\log n)\) we can say that the time complexity is \(O(\log n)\).

Other Advantages of AVL Trees

An AVL tree can be used to implement a dictionary as we have already seen. But it can also be used to implement a priority queue. The minimum or maximum element can be extracted in \(O(\log n)\) time by repeatedly traversing the left or right child of the root. But the AVL tree is less efficient as it will have a higher memory and time overhead compared to a heap. However, the AVL tree is a lot more flexible due to the fact of being a min and max heap at the same time and offering efficient search operations.

By always keeping track of how many nodes are in the left and right subtree of a node we can also find the \(k\)-th smallest or largest element in the tree in \(O(\log n)\) time. For simplicity we will only consider the \(k\)-th smallest element as the \(k\)-th largest element is just the \(n-k\)-th smallest element.

Todo

Implement the \(k\)-th smallest element search.

Last updated on