The code in those 2 files was not written with readability in mind...
... but I did clean it up a little!
In general, I try to constantly strike a balance between:
verbose and succinct,
formal and informal,
high level and low level,
general and particular.
3. List Trees
A List Tree is a binary tree with extra structure.
Nodes can be at any level as long as:
the left child is at a lower level than its parent,
the right child is at an equal or lower level than its parent.
More formally, for each node N:
N.left=Null∨N.left.level<N.level
N.right=Null∨N.right.level≤N.level
Right children at the same level as their parent are called high children.
A leaf is any node at level 0.
Paths can have multiple leaves because of high children.
A maximal list is a sequence N1,N2,…,Nn such that:
All nodes in it are at the same level.
Ni.right =Ni+1 for i=1,2,…,n−1.
(left‑maximality) There's no N in the tree such that N.right=N1∧N.level=N1.level
(right‑maximality) There's no N in the tree such that Nn.right=N∧N.level=Nn.level
From now on, list means maximal list unless otherwise specified.
Take a look at the following randomly-generated List Tree:
3.1. About the pictures
For practical reasons, the X axis has been replaced with the sorted-position axis.
More precisely:
Let N1,N2,…,Nn be the nodes in the tree sorted in ascending order, according to their keys.
Then, Ni has coordinate x equal to i, for i=1,2,…,n.
This changes nothing at the conceptual level.
4. Lexi Trees
A Lexi Tree is a List Tree with a lexicographic order.
Consider the cartesian KL plane where:
the X axis is for the keys,
the Y axis is for the levels.
Nodes are points on the KL plane and can be represented as (k,l) pairs.
Let's impose a lexicographic order on the KL plane: (k1,l1)<(k2,l2)⟺l1<l2∨(l1=l2∧k1<k2)
In words:
The node at a higher level comes first.
If two nodes are at the same level, the one with the smaller key comes first.
Basically, the lexicographic order is:
first top to bottom,
then left to right.
To specify that we're referring to that lexicographic order, we can add the prefix lexi, so, for instance, we can say that a node N1 is lexi‑smaller than a node N2.
Analogously, when we want to refer to just the keys, we can say that N1 is key‑smaller than N2.
When we work on the KL plane, the arrows become redundant!
4.1. Rules for the arrows
Imagine that from each node starts a vertical line which goes straight to the bottom:
No arrow can cross that line.
This is true for all binary search trees in general.
More precisely:
The root owns the whole key‑space.
Each node N, of key k, splits its key‑space S into 2 subspaces:
left‑space: the part of S with keys ≤k.
right‑space: the part of S with keys >k.
The left (right) arrow of each node N is either
absent if N has no nodes in its left (right) space, or
pointing to the lexi‑smallest node in the left(right)‑space.
Intuitively, this means that if a node N has a row of nodes, all at the same level, right below N in the same subspace, then N always points to the leftmost one.
Here are a few examples:
Here's a bigger one, but without labels for lack of space:
4.2. Operations
Lexi Trees support two primitive operations:
lifting: to lift a node, increment its level by 1 and fix the arrows.
lowering: to lower a node, decrement its level by 1 and fix the arrows.
In general, we can k‑lift or k‑lower a node, i.e. lift or lower it by k levels.
An insertion at level L can be seen, conceptually, as an L+1‑lifting from level -1.
A deletion from level L can be seen, conceptually, as an L+1‑lowering to level -1.
Note:
The left-to-right order of the keys is fixed and beyond our control.
We can play with the levels however we want, though.
Rebalancing a tree is nothing but readjusting the levels without touching the keys.
5. P‑Lexi Trees
P‑Lexi Tree is short for Probabilistic Lexi Tree.
All root→leaf paths in P‑Lexi Trees have expected length O(logn).
The expectation is over all the possible P‑Lexi Trees that can be randomly generated given the same sequence of operations.
Nodes:
are inserted at random heights,
can't be moved around once in the tree,
can be deleted.
P‑Lexi Trees work as expected (pun intended) if and only ifSkip Lists work, because...
5.1. P‑Lexi Trees are isomorphic to Skip Lists
5.1.1. Proof (sketch)
Let's consider the following Skip List:
Let's find, say, key 56:
Note that we only use the highest arrow to a node during searches, so we can remove the others:
Now each node has just one parent, but can still have several children.
We can reduce the number of children by doing the following:
for each node N:
letAh be the highest arrow starting fromN.
for each arrow A that starts fromN, with A=Ah:
letA2 be the arrow right aboveA.
letN2 be the node pointed to by A2.
make Astart from N2 rather than from N.
For instance:
Header H has 5 forward arrows.
The highest arrow points to node 30
The 2nd-highest arrow points to node 21:
we change its starting point so that it starts from node 30.
The 3rd-highest arrow points to node 13:
we change its starting point so that it starts from node 21.
and so on...
Here's the result:
First cosmetic change: let's make the arrows start from the top of the nodes:
Second and last cosmetic change: let's shorten the nodes:
The transformations are invertible so we can also transform a Lexi Tree into a Skip List.
5.2. Why do Skip Lists work?
Here's my intuition:
Let SL be the perfectly balanced Skip List in the following figure:
The height distribution
Note that SL, ignoring the header (the gray node), has:
8 nodes of height 1
4 nodes of height 2
2 nodes of height 3
1 node of height 4
For k=1,2,3,4, the nodes of height > k are 1 fewer than the nodes of height k.
As the number of nodes in SL goes to infinity, though, that difference of 1 node becomes insignificant, so...
We can generate the heights from the geometric distribution, with parameter p=0.5, simply by repeatedly tossing a fair coin:
height =1while coin toss == heads:
height +=1
Note that, by convention, height=level+1.
Since we toss a fair coin to decide whether we stop at the current height or we keep going, the following probabilities are equal:
the probability of a node being at level L,
the probability of a node being above level L.
This means that we expect the following values to be equal:
the number of nodes at level L,
the number of nodes above level L.
That should make the connection between SL and the geometric distribution clear.
Now think of the nodes as (key,level) pairs.
Shuffle the levels (i.e. swap them between nodes), but keep the keys fixed.
If the shuffle is uniform enough, we expect the root→leaf paths (or their equivalent in a Skip List) to be of length O(logn)on average.
By starting with an empty Skip List and inserting nodes with heights chosen from the same geometric distribution as SL, we're, basically, directly generating the shuffled version of SL.
5.3. Operations
5.3.1. Nodes
Since nodes are never moved around once inserted, we can optimize a few things:
leaves only need the right pointer
The level in lower nodes can be represented as few bits.
Since there are, on average, exponentially many more lower nodes than higher nodes, this saves a considerable amount of space.
5.3.2. Search Paths
Key searches don't need to know the level of the nodes, so:
key searches are the same as in binary trees.
Here are a few examples of search paths in a P‑Lexi Tree with 1M nodes:
The red line is the key line, i.e. the position of the key we searched for.
Notice that we found the key in two cases.
When the key is not found, the key line indicates its insertion position.
During key searches:
We're almost always moving to nodes closer to the key line:
all other nodes are ignored.
I said "almost always" because there's a single exception:
When we find the key node, we usually either:
stop the search, or
go left one single step and then:
go all the way to the right to get to the rightmost leaf key‑before the key node.
Note:
if we step left from the key node, then:
all the nodes we visit from that point onward will have keys smaller than (or equal to, if the keys are not unique) the searched-for key.
Let's add some of the ignored nodes:
5.3.3. Insertions
To insert a node of key K we first search for K.
If K is already present, we stop:
for a set, we could just return;
for a map, we could just update the value of the key node and return.
We must also choose a random insertion levelL the same way as for Skip Lists.
While we're searching for K, we also remember some important points along the search path.
These important points:
are at level ≤L,
are 4 per level at most, 2 per side,
come in (first,last) pairs, where we may have first=last.
Let's look at some examples before saying anything more:
Before insertion at level 0:
After insertion at level 0:
The blue node is the pair (first,last), with first=last, on the left side.
We need to remember that pair.
Note that all the nodes above level 0 are grayed out because we don't need to remember them.
Now look for a pattern as we change the insertion level L.
L=3:
L=7:
L=12:
L=19:
Basically, the red node is like a pair of scissors that cut along the red key line, starting from the bottom and going upwards up to level L.
That's why the important nodes we need to remember are at or below the insertion level L.
Here's the L=19 case again, before the insertion:
I'll call "crossing the red key line" simply "the crossing".
From the picture above we can see that:
G2 is the last node before the crossing
B1 is the first node right after the crossing
B2 is the last node before the crossing
G3 is the first node right after the crossing
and so on...
By convention, we say that G1 is also in the "first node" category.
The (first,last) pairs are then:
(G1,G2)
(B1,B2)
(G3,G4)
(B3,B4)
(G5,G5)
(B5,B5)
and so on...
We don't need to remember the black nodes.
Now let's look again at the situation after the insertion:
The inserted red node points to B1 and G1.
B1 starts the left chain and G1 the right chain.
The last node of each pair connects to the first node of the pair right after it, on the same side, i.e. blue with blue and green with green.
The black nodes are left untouched and that's why we don't need to remember them.
5.3.4. Deletions
For the deletion, just consider the two pictures above in the opposite order!
Basically:
an insertion is a multi-level splitting operation, while
a deletion is a multi-level merging operation.
To delete a node:
Find the node N to delete.
Let's say N is the red point in the last picture above.
The left chain starts with N.left.
The right chain starts with N.right.
We follow the left chain by always going right.
We follow the right chain by always going left.
Note that both chains are lexicographically ordered.
This suggests a merge sort step of the two chains!
We keep two pointers, one per chain, and...
... take one step along the chain with the lexi‑smaller node, ...
... and repeat until we're done merging the chains.
Of course, when we cross, we link.
For example, if nodes N1<lexiN2 are adjacent in the merged sequence and they're on different sides, then we make N1 point to N2 using N1's left or right pointer as appropriate.
For once, the deletion is easier than the insertion!
5.4. Parallelization
P‑Lexi Trees are easy to parallelize because all operations go down, from root to leaf.
The higher-in-the-tree operation just need to wait for the lower-in-the-tree operation to finish.
No operation needs to give up to avoid a deadlock.
6. k-Lexi Trees
A k-Lexi Tree is a Lexi Tree where:
each list has length k at most
all root→leaf paths are complete
A path P is complete ⟺
for each level l from 0 to the level of the root,
there's a node in P at level l.
Intuitively, a path is complete ⟺ it has no holes.
A k-Lexi Tree is k-balanced, i.e. each root→leaf path has length at most klog2(n+1).
Every node at level 0 is considered a leaf.
Leaves can only have high right children.
Of course, we just need a single bit to indicate a high right node.
Practical consideration: In the source code, a k-Lexi Tree is called a DkLTree, where "D" stands for deterministic.
7. 2‑Lexi Trees
Recall:
for each list L:
length(L)≤2
no holes along any root→leaf path
Example
Here's a 2‑Lexi Tree with 100 nodes:
This was created by inserting 100 random uniformly distributed nodes one after the other.
7.1. Insertion
We insert the new node N into the correct position at level 0, as a leaf.
N is now in a list L of length at most 3.
If L has 3 nodes, we lift the middle one, go a level higher along the search path, and keep lifting middle nodes until needed, going up.
Example:
Here are the pictures before and after the insertion without rebalancing:
Here's the picture after the rebalancing:
Here's the full animation:
7.2. Deletion
We delete a leaf N by, well, removing it.
If that removal leaves a hole:
we go up one level to the (ex-)parent P, along the search path,
we lowerP to close the hole below,
and repeat the process as many times as needed...
To delete an internal node N, we replace it with the rightmost leaf on its left, i.e. the leaf immediately key‑before it. This is a classic trick.
If the leaf leaves a hole at level 0, we proceed as before.
Example
Here are the pictures before and after the deletion of the node without rebalancing:
Here's the picture after the rebalancing:
Here's the full animation:
Let's focus on the lower levels and add another level of gray nodes:
(before)
(after, without rebalancing)
(after, with rebalancing)
(animation)
The two animations look weird:
Why are the gray nodes lifted rather than lowered as we'd expect?
This happens because there's no way to represent an internal hole.
To do that, we'd need a way to indicate that a node is 2 levels lower than its parent.
This is what should happen when we lower B2 to close the hole:
Note that we don't care if nodes we left behind are lifted. All we care about is that no node moving in the after→before direction can pass through us without us realizing it.
We're dealing with pointers, so we must use some kind of blazingly fast constant time Garbage Collector:
We keep lists of nodes to delete (one list per execution unit ⟹ no synchronization required)
When we are sure that no operations currently accessing the tree can have a reference to certain nodes, we deallocate those nodes.
We should only use timestamps and completely ignore the values of the pointers.
The timestamps might be owned by the lists (of nodes to delete) rather than by the nodes themselves:
this would reduce the number of timestamps and comparisons.
8.4.2.2. Locking
We could lock nodes by setting 2 bits in the version field.
I'd use two types of locks:
reserve: Nobody can modify this node, but we're not modifying it either at the moment, so searches can go through without problems.
mutation: We're actually modifying the node.
Nodes should be locked in lexicographic order to avoid deadlocks:
The lexicographic order is handy because we don't need to do any extra sorting.
8.4.2.3. Deletions
Deletions are way more complicated than insertions.
What follows is a little involved because of the need to lock nodes in lexicographic order.
I would proceed as follows:
During the search we remember all the potential LOs (we already remember the path to be able to go back, anyway).
When we reach KN, the node to delete, we put a reserve lock on it.
From this point onward, we also keep a reserve lock on the latest LO.
We do this by moving the lock from LO to LO by first locking the new LO and then unlocking the previous one.
If we found no LO after KN:
We unlock KN (because of the lexicographic order requirement),
we go back to the latest LO and keep going back to the previous saved LOs until we find a valid one (remember the versioning?).
If we don't find a valid LO:
we restart from the root.
We go down again, tracking the latest LO with a reserve lock.
When we reach KN, we put a reserve lock on it.
We keep tracking the latest LO, even jumping over KN.
When we reach the leaf:
we turn the reserve locks into mutation locks,
we carry out the operation.
Why do all that?
It's a heuristic:
The happy path should be good enough.
There's no risk of starvation (i.e. of trying forever without ever completing the operation).
Other heuristics with different guarantees are certainly possible.
9. Implementations
I implemented one single-threaded version of:
P‑Lexi Trees
2‑Lexi Trees
3‑Lexi Trees
I'm sure many other (substantially different) implementations are possible.
I won't go over the specific cases and similar details because I find them terribly boring and uninteresting.
Since that's the way I roll, I decided to write readable source code instead:
I chose the Python language because it reads almost as pseudo-code.
You can install all the external dependencies with pip install -r requirements.txt
I tested the code with CPython (Python 3.9) and Pypy (Python 3.8).
It's a pity static typing is a little clunky (compared to Typescript).
I prioritized readability and simplicity both at the language and the algorithmic level.
All the cases are documented in the code:
I like to include remarks, small proofs, and ASCII pictures in my code.
# Case II: prev.left = cur
# P2 ------------. P2 ==> P2 ---. .-------- P2
# \ / ==> \ /
# P2 ------------> P ==> P2 ---> r ---> P
# _________/ ==> / /
# / ==> / /
# c r r2 ==> c r2
# / / ==> / \
# NOTE: Prev2 can be in any of the 3 positions shown above.
P2 is Prev2, of course.
As the NOTE says, P2 can be in 3 different positions:
above P, on its right
above P, on its left
at the same level of P, on its left:
P is the high right child of P2.
The only difference is that we must modify the correct pointer of P2.
Note that P2.high_right (a bit, in practice) is left untouched:
if prev2.right is prev:
prev2.right = right # keeps same high_rightelse:
prev2.left = right
# Case Right3
# P --. .------------ P ==> P ----------. .----- P
# \ / ==> \ /
# P --> c1 - - - - - > r ==> P ----------> o2 - -> r
# / \ _ _ _ _/ ==> ________/ \ /
# / \ / ==> / \ /
# c2 o1 o2 o3 ==> c2 c1 o1 o3
# \ / / ==> / / \
#
# Case Right2
# P ---. .--------- P ==> P -------. .----- P
# \ / ==> \ /
# P ---> c1 - - - -> r ==> P -------> o1 - -> r
# / \ _ _ / ==> ____/ \ /
# / \ / ==> / \ /
# c2 o1 o2 ==> c2 c1 o2
# \ / ==> / \
Since these two cases are very similar, I consider them a single one.
The only novelty is that r:
As an example, Case Right3 can be expanded as:
# Case Right3-no-r
# P --. .------------ P ==> P ----------. .----- P
# \ / ==> \ /
# P --> c1 ==> P ----------> o2
# / \ ==> ________/ \
# / \ ==> / \
# c2 o1 o2 o3 ==> c2 c1 o1 o3
# \ / / ==> / / \
#
# Case Right3-r
# P --. .------------ P ==> P ----------. .----- P
# \ / ==> \ /
# P --> c1 ----------> r ==> P ----------> o2 ---> r
# / \ _______/ ==> ________/ \ /
# / \ / ==> / \ /
# c2 o1 o2 o3 ==> c2 c1 o1 o3
# \ / / ==> / / \
Note that now the arrows from and to r are continuous.
9.2.4. Remarks
The lower function lifts nodes as well, when needed.
The cases can be simplified by decoupling the lowerings and the liftings.
I designed lower that way because:
I didn't want operations to break invariants.
I didn't want to set arrows one way just to change them an instant later.
I don't know if it was worth it, though:
As always, one should try several variations and see what are the pros and cons of each.
10. Tests & Statistics
10.1. Tests
I use what I call a seasonal test:
Its seasonal nature is controlled by two parameters:
N, the number of operations
C, the number of cycles
The test performs a total of N operations.
The operations are conceptually chunked into C contiguous sequences of N/C operations.
(In the actual implementation, the C cycles are mapped directly to the N operations.)
In each chunk:
Each operation can either be an insertion or a deletion.
The time goes from t=0 to t=2π.
At time t, the probability p of the current operation being an insertion is (sint+1)/2.
In fact: t∈[0,2π]sin[−1,1](+1)[0,2](/2)[0,1]∋p
The tree tends to grow when p>0.5 and shrink when p<0.5:
Of course, the test is full of asserts and it also checks the invariants of the tree at regular intervals of time.
The test also takes a key picker which generates the keys to insert and delete.
10.2. Key Pickers
A key picker is implemented as a class KeyPicker.
Here's the interface:
classKeyPicker(Protocol[K_Cov]):defget_ins_key(self)-> K_Cov:"""NOTE: The returned key is remembered and can be returned by
get_del_key."""...defget_del_key(self)-> K_Cov |None:"""NOTE: The returned key is forgotten."""
I implemented the simple key pickers illustrated in the following image:
The image was created by requesting first N insertions and then N deletions.
Remember, though, that the real test chooses the type of each operation at random, according to the cycle.
Uniform:
The keys to insert are (almost) uniformly distributed over a given interval.
I said "almost" because:
one can ask the Uniform key picker to only return integer keys, and ...
... I use round instead of floor to map the floats to ints.
The keys to delete are returned in random order.
Sometimes, the key picker returns (on purpose!) keys to delete that were never inserted.
The FIFO versions return the keys to delete in the same order as the keys to insert.
The LIFO versions invert the order, unsurprisingly.
Center FIFO and Center LIFO are the only ones ("almost" above aside) that generate keys that are NOT uniformly distributed.
The test collects, at regular intervals of time, the lengths of the root→leaf paths:
Note that the lengths of the paths from the root to internal nodes are ignored.
The path lengths are divided by log2(N+1), i.e. the optimal length of the path lengths.
This means that we end up with ratios, where 1.0 means optimal.
k-Lexi Trees guarantee that the ratios are never above k.
Just knowing the worst case is not enough.
I wanted to know more about the ratio distribution.
I first tried sampling:
I found the [min_key, max_key] interval for the keys.
I generated K uniformly distributed keys in that interval.
I searched for each one of those K keys, measuring the length of each search path.
The time complexity is O(KlogN) (in expectation, for P-Lexi Trees).
CONS:
It only works when the keys are actually uniformly distributed.
In particular, it won't work with the Center FIFO and Center LIFO key pickers.
I then decided to compute the exact min, max, mean, and std:
The time complexity is O(N).
CONS:
The ratio distribution is NOTGaussian and not even symmetric.
The standard deviation is not enough.
The final implementation computes the full set of path lengths:
The empirical distribution is summarized by selecting 19 evenly spaced quantiles, including the min and the max.
The time complexity is O(N).
10.3.2. Results
First of all:
We expect the P‑Lexi Trees graphs to be independent of the key distribution.
Trivia:
The graphs were good,
but then I added the two Center key pickers and
they gave me very odd results.
I first suspected that random() was the culprit and tried a better PRNG.
Then I remembered that I was sampling the path lengths...
... assuming that the keys were uniformly distributed, and...
... the Center key pickers definitely break that assumption.
I decided to conduct two types of tests:
insertions only, 10M operations
both operations, 20M operations, 2 cycles.
Uniform Picker, insertions only (10M ops):
Let's first see the separate graphs:
Here's a direct comparison between P‑Lexi and 2‑Lexi Trees:
As we can see, 2‑Lexi Trees are vastly superior to P‑Lexi Trees, at least in this case.
Here's the comparison between 2‑Lexi and 3‑Lexi Trees:
2‑Lexi Trees appears to be only slightly better than 3‑Lexi Trees.
This makes the 3‑Lexi Trees surprisingly good.
In all cases, the quantiles are all clustered around the medians, which means that...
... there are few very short and very long paths.
Uniform Picker, both operations (20M ops, 2 cycles):
Again, let's start with the separate graphs:
And here are the comparisons:
The situation didn't change much, so the deletions don't seem to cause any problems.
We can see a spike in the middle (50 on the X axis), but that's probably due to random noise:
... the trees are very small at that time, because of the 2 cycles.
It's like when you toss a coin: a few tosses won't give you a great estimation of the true mean and...
... if you repeat the experiment you'll get big variations.
Note that we shouldn't expect the P‑Lexi Trees to do better or worse in different situations.
If that happens, we have a bug somewhere! Remember the Trivia above.
Here are all the other cases for the P‑Lexi Trees:Show Images
Here are the ones for the 2‑Lexi Trees:Show Images
And here are the ones for the 3‑Lexi Trees:Show Images
Here are all the comparisons between P‑Lexi and 2‑Lexi Trees:Show Images
And here are all the comparisons between 3‑Lexi and 2‑Lexi Trees:Show Images
As we noted before, P‑Lexi Trees shouldn't and, in fact, aren't influenced by the key pickers.
The 2- and 3‑Lexi Trees exhibit strong asymmetry:
That's to be expected since only right children can be high.
The exceptionally good cases are:
LHS RHS (of the tree)
Dec FIFO, inserts only: I--------
Dec LIFO, inserts only: I--------
Dec LIFO, both ops: DI--------
where: I = inserts; D = deletes
As indicated in the scheme above, these are all cases where we operate on the LHS of the tree.
This suggests that we might want to invert the order of the keys if we suspect that the order will be mostly ascending.
The 2- and 3‑Lexi Trees are somewhat influenced by the key pickers, but:
Most of the paths are very close to the optimum.
The other paths are still way below 2 times the optimum.
This is very good, considering that 3‑Lexi Trees give only a 3log2(N+1) guarantee.
To do a complete analysis of 3‑Lexi Trees, we should distinguish 2 kinds of paths:
hot paths: the high traffic paths
cold paths: the low traffic paths
This is especially true if searches are also able to lift nodes when they find triples. That would guarantee that:
hot paths are of length 2log2(N+1) at most
cold paths are of length 3log2(N+1) at most
Since cold paths are, by definition, low traffic, this would make 2- and 3‑Lexi Trees virtually indistinguishable regarding the path lengths.
3‑Lexi Trees are heavier than 2‑Lexi Trees algorithmically, though.
2‑Lexi Trees should always be preferred for the single-thread case.
3‑Lexi Trees should be more performant than 2‑Lexi Trees in the multi-threading settings, but this is still a conjecture.
Thanks
Thank you for reading this! I hope you had a good time 😃