Wednesday, July 21, 2010

64-bit Waymarking

Finally I came around working on the 64bit variant of my (array) waymarking algorithm, the result of which can be observer here (starting at line 77). But I'd like to provide some background first.
Back when I implemented the original waymarking algorithm with a 2-bit alphabet used as the marks, I hoped that extending the alphabet to have 3 bits could be a major win. 64-bit pointers are by ABI rules aligned on an eight byte boundary, so pointers to these have their 3 lowest bits cleared, and I can use this space for jotting down the marks. With eight letters I can fully encode four digits and the two stops. This almost halves the number of accesses. The big question is how to assign a lucrative rôle to the remaining two characters! After some thinking I settled with the idea to make them valued stops. Clearly, increasing the frequency of stops reduces the chance of long linear searches. In the current incarnation 'x' carries a value of (binary) 01 at the start and at the end of a digit sequence, while 'y' carries 02 in the start rôle only. 's' is not carrying any value and is used when neither of the others fit. Since "s0" is a silly fragment to create, so I assign a special meaning of "s10" to it.

I designed the decoder (d3code) first. It is essential that it obtains the correct length for the string tail wherever it starts out. And it should minimize the number of accesses to characters in the string. The decoding rules are taylored to only need maximally 4 accesses to compute the correct length for the last 12 characters. Actually the last couple of marks were a result of a co-evolution with the decoder's rules. In the front of this part the digit sequences become longer and there is less wiggle room, this part can be automatically generated.

The code as it is now needs some serious cleanup, but it does QuickCheck and I am pretty confident that it is correct for all lengths. Of course you are encouraged to find a better scheme, but so far I am happy with it.

PS: As I wrote this I discovered that I can eliminate another 2 characters of my hand-written seed string.
PPS: I do not employ the 'x' terminates with a value rule in the generator yet, only in the hand-written seed. But it should be easy enough to change every "3sx" to "x" in the generator.

Wednesday, July 14, 2010

The Proof is in the Pudding

Looks like I've finally done it! r108240 stuck and no broken external projects have been reported. I've prepared some little statistic and quietly celebrating the 1%+ speed gain of LLVM in the last week.
Now on, plucking some more low-hanging fruit before adventuring into some more hard-core changes!

Update: The ClangBSD project will surely put it through some serious regression testing...