Lowest Common Ancestor
Update 14Jun2007 12:33:13 CEST
Please read these notes first
Fast Intel Implementation
What is Lowest Common Ancestor? I believe one image is worth more than thousand words.LCA of node 9 and 10 is 8.
LCA of node 9 and 7 is 5.
LCA of node 8 and 4 is 1.
Now let's formalize that.
 Definition  Ancestor
 In a rooted tree node u is an ancestor of v if node u occures on a path from v to root (so root is ancestor of any other node)
 Definition  Lowest Common Ancestor
 of two nodes a and b is the deepest node, that is ancestor of both nodes a, b.
Finding Lowest Common Ancestor is a part of Ukkonen sufix tree matching algorithm.
Lowest Common Ancestor  Classic Algorithm
 First we will number nodes inorder in a binary tree, using DFS:
 Now if we XOR the values of the nodes, the first
bit from the left set, indicates the difference in path, so we rewrite 'matching bits',
add bit of height (if you have not noticed, rightmost bit set in node's number is height),
and fill the rest with 0's exampli gratia:
1101 XOR 1111 = 0010 that gives us: 1110
1101 XOR 1011 = 0110 that gives us: 1100
matched bits are marked with green colour.
 Everything would be cool and simple, but if we take two nodes on the same path to root:
0100 XOR 0111 = 0011 that gives us: 0110 WRONG!

Wrong way
Let's add preordered numbering using DFS:
 Definition  Ancestor
 Let G(v) be the dfs preorder number of node v, and
let s(v) be the number of nodes in subtree with root in v.
Node v is ancestor of node u iff:
G(v) <= G(u) AND G(v) + s(v) > G(u)
(nodes 4 and 8) 2 < 8 AND 2 + 7 > 8
(nodes 4 and 3) 2 < 5 AND 2 + 7 > 5
So to get LCA of two nodes, we should first make checks like above, and if it is not true, use the XOR as presented above.
Lowest Common Ancestor  GiM's modification

GiM's simple lcatrick
First of all, it is obvious, that LCA(v, v) = v, for every node v. Definition  Height of a node
 Height of a node v is the longest path from v to leaf.
Let v be the node with lower inorder number than u.
Our check looks like this:
if (v is even AND u < v + 2^(height(v)))
v is ancestor of u!
Small explanation. We could add second condition:
if (v is odd AND v > u  2^(height(u)))
u is ancestor of v
but you must notice that only 'right edges' are problematic ones (the ones that xor is not working with them).

Semiproof of corectness
Since we have assumed, that v is the node with lower inorder number than u, and that we are dealing only with 'right edges', check weather v is even is equivalent to:
G(v) <= G(u)
part. Now, it is obvious that number of subnodes in full binary tree is equal to:
s(v) = 2^(height(v)+1)  1 exampli gratia: (node 14, height 1) s(14) = 2^(2)  1 = 3
Now 2^height(v) is simply border, that means minimum and maximum number of subnodes of node v are between: (v  2^(height(v), v + 2^(height(v)) exampli gratia: (node 14) (14  2^1, 14 + 2^2) = (12, 16)
Therefore u < v + 2^(height(v)) part is equivalent to G(v) + s(v) > G(u)
Formal proof is left as an excercise. 
Implementaion
I have written two versions of bsf, and bsr functions, that are returning position of rightmost and leftmost bit set respectively. One version makes use of x86 assembly instructions bsf and bsr, while the other is fast, loop unrolled version for RISC processors family (they can be used on CISC processors too of course).
Implementation is 32bit oriented, that means maximum tree can have depth of 32. It would probably reqire few changes to run it on 64bit capable processor.
You can take a look at htmlized verion: bsfr.c, bsfr.h (download at the bottom).
The lca_bin function as defined above is showed in test_lca.c
I've also creates small program (lca_gfx.c) using libbpng library, and generated some interesting images presented below. 
Efectiveness
If we assume x86 bsf and bsr instructions works in O(1) time, the lca implementation (lca_bin) presented works also in O(1) time.
The 'RISC alternative' functions works also in theory in O(1) time (but in reality it works in O(logn) time, where n is number of bits, so it would require to add one more step in bsf, and bsr procedures to port them to 64bit processor). 
Images!
Classic XOR
Simple Ones
row [4*i + 0] = lca_bin (i, j);
(red channel)row [4*i + 1] = 255  lca_bin(i, j);
(green channel)row [4*i + 2] = lca_bin (j, i);
(blue channel)Reversed
row [4*i + 0] = lca_bin ((1 << BITS)  i, j);
reversed xrow [4*i + 0] = lca_bin (i, (1 << BITS)  j);
reversed yMixed channels
row [4*i + 0] = lca_bin ((1 << BITS)  i, j);
row [4*i + 1] = lca_bin ((1 << BITS)  i, j);
row [4*i + 2] = lca_bin (i, j);
3 channels, red and green reversed x, blue normalrow [4*i + 0] = lca_bin ((1 << BITS)  i, j);
row [4*i + 1] = lca_bin (i, (1 << BITS)  j);
row [4*i + 2] = lca_bin (i, j);
3 channels, red reversed x, green reversed y, blue normal'Zoomed'
row [4*i + 0] = lca_bin (i, j) >> 2;
row [4*i + 1] = lca_bin (i, j) >> 2;
row [4*i + 2] = lca_bin (i, j) >> 2;
'zoomed in'row [4*i + 0] = lca_bin (i, j) << 2;
row [4*i + 1] = lca_bin (i, j) << 2;
row [4*i + 2] = lca_bin (i, j) << 2;
'zoomed out'Most interesting
Modified version of lca_bin used to show how does interesting edges really look likerow [4*i + 0] = lca_bin_cool(i, j, 0xff);
row [4*i + 1] = lca_bin_cool(i, j, 0xff) & 0x80;
make diagonal orange, and show interesting edges (if there would be 'second condition' I've mentioned earlier, this one would result in nice square)row [4*i + 0] = lca_bin_cool(i, j, 0);
interesting edges in black, rest normal