Sunday, April 13, 2008

LLVM Data Structures, and Putting Use on Diet

There is a good deal of refactoring going on in the LLVM universe, to wit Dominic's renaming of LLVM*Builder to IRBuilder with assorted simplifications...
The train is moving fast, getting on board is harder, but rewarding. LLVM does not commit to API (or less even, binary) compatibility, so meaningful changes get in swiftly and without much bureaucracy. A simple mail, warning people of the change in advance, and when accomplished, instructions how to do the conversion, are enough to make folks happy.

There are reforms going more than skin deep, too. Clang is is getting a datastructure rewrite with nice gains, and myself is about to reduce the size of the Use struct by 4 bytes, from 16 to 12 (on a 32bit system). The 25% savings is nothing to sneeze at, considering that this struct is the most frequently allocated one in LLVM. And now that I have the functionality basically implemented on a branch, I can say, the idea is working! Consequently, I have the courage to blog about it :-)

So what is this Use-diet all about? The Use struct is the home of the pointers to all Values that an Instruction references. But each Value has to track all of its Users (i.e. the Instructions), and Use provides forward and backward pointers to chain those up. These are the essential 3 pointers. But why is Use 16 bytes? Because in some situations it is important to get back to the User of the referred Value. So it is 4 pointers in total.

Seemingly this is how it works™ and there is nothing that can be shaved off. But wait! Don't all Uses belonging to an Instruction come lined up in a contiguous array? What if we could mark the last Use specially and put a pointer to the Instruction behind it? Or even allocate the Uses immediately in front of Instruction (memory-layout wise)?

These were the first ideas how my brain-storming with Sabre began. After several exchanged emails, we settled on a concept that is IMHO really beautiful. We use 2 bits (the least significant ones, which are always zero, normally) of one of the pointers in Use to implement a serial line-like protocol of waymarks that guides us to the end of the array in some reasonably few steps. I will not detail the algorithm, since it is documented elsewhere, I will only say that there are four kinds of waymarks: ‹fullstop›, ‹stop›, ‹0› and ‹1›.
  • Fullstop means we are at the end already,
  • Stop means begin gathering digits, or if already done so, convert them to an offset, that brings us to the end,
  • 0 and 1 are the binary digits to be picked up.
It is clear to see that for small arrays there are only a small number of operations needed to determine the end of the array. The complexity of the algorithm is O(log N). So we do not really have to get concerned with, say, 10000 predecessors to a PHI node :-). The description even contains a Haskell snippet to encode and decode such offset information.
For reference I shall present it here:
> import Test.QuickCheck
> digits :: Int -> [Char] -> [Char]
> digits 0 acc = '0' : acc
> digits 1 acc = '1' : acc
> digits n acc = digits (n `div` 2) $ digits (n `mod` 2) acc
> dist :: Int -> [Char] -> [Char]
> dist 0 [] = ['S']
> dist 0 acc = acc
> dist 1 acc = let r = dist 0 acc in 's' : digits (length r) r
> dist n acc = dist (n - 1) $ dist 1 acc
> takeLast n ss = reverse $ take n $ reverse ss
> test = takeLast 40 $ dist 20 []

Printing gives: "1s100000s11010s10100s1111s1010s110s11s1S"

The reverse algorithm computes the
length of the string just by examining
a certain prefix:

> pref :: [Char] -> Int
> pref "S" = 1
> pref ('s':'1':rest) = decode 2 1 rest
> pref (_:rest) = 1 + pref rest
> decode walk acc ('0':rest) = decode (walk + 1) (acc * 2) rest
> decode walk acc ('1':rest) = decode (walk + 1) (acc * 2 + 1) rest
> decode walk acc _ = walk + acc

Now, as expected, printing gives 40.

We can quickCheck this with following property:

> testcase = dist 2000 []
> testcaseLength = length testcase
> identityProp n = n > 0 && n <= testcaseLength ==> length arr == pref arr
> where arr = takeLast n testcase

As expected gives:

*Main> quickCheck identityProp
OK, passed 100 tests.
Btw., QuickCheck is awesome!

So where are the uses of this algorithm outside of LLVM?
There is not much of thinking needed to generalize the array to other data structures, which may permit mutating of the node contents, but disallow insertions. Linked lists are an example since one can only cons up stuff to the head. Doubly-ended arrays with fast size() operation (given a pointer to one element) can be implemented if there is another pointer in each node that we can use for storing waymarks to the start of the array. Deques also could work like this, but they too, need to be fully built up before putting in the waymarks.

This all reminds me of cons-hashing but is really a more powerful concept. Let's call it waymarking! And then let the garbage collector put in the waymarks for us...
Down with O(n) complexity on linked-list's length operation!

1 comment:

Edward Kmett said...

You can of course, also derive a faster length by using skew-binary-encoded random-access lists.

That then gives you O(log n) drop and indexing as well.