Thursday, October 9, 2008


Heute war doch ein recht spezieller Tag. Notartermin zwecks Vertragsunterzeichnung. Ich kann nur sagen daß mein Herz doch ziemlich in die Hose geruscht ist...
...noch nie habe ich so eine Menge Geld auf eine Karte gesetzt. Als nächstes nun wird bezahlt und dann beginnt der lange, schweißtreibende Weg zur Tilgung. Darüber später mehr.

Wednesday, September 10, 2008

Hoare Triples and Thrists

While mousing through the L4.verified presentation that took place at Galois, I could not help but build an association bridge between Hoare triples and thrists. Especially when you go to p. 52 and thereabout.
Here is my take on the connection:

We have some commands that belong to a data type C, and C has two type parameters:
data C :: * -> * -> * where
C1 :: ... -> C a b
The first type parameter can be interpreted as the precondition of the command while the second as its postcondition ({P} and {Q} in Hoare's notation). This convention employs the types-as-properties interpretation.
If commands are sequenced, the postconditions of the former commands must imply the preconditions of the latter. This is referred to as the composition rule and corresponds to appending two thrists.

Now, in Ωmega the situation is even prettier, because C need not be a two-type-parameter entity, but can be parameterized over arbitrary kinds. The built-in evaluation mechanism (type functions) allow the most powerful constructs. (Haskell will get something similar at the type level, called type families, with GHC v6.10.)

Wednesday, September 3, 2008


É muito bom ficar sozinho alguns dias, pensando e trabalhando em paz. Mas quando as semanas passam a distância parece que aumenta e a ausência começa de doer.

Meus amores sinto muito falta de voces, quero que voltam logo.

Monday, September 1, 2008

Meme evolution

While I am missing family induced distraction, I use the opportunity of reading old papers that I always wanted to revisit.

Yesterday, while reading "A History of Haskell" I was amused by the sentence:
"Wadler misunderstood what Fasel had in mind, and type classes were born!"
This is a concrete case of "meme evolution" (in Dawkins' sense): a mutation of a meme (by transcription error) finds a new habitat where it thrives and blossoms.

Some preconditions are needed for this to happen:
  • communication
  • loosening of amorphous ideas
  • an open mind, not shying away from listening to the unknown
Many great ideas come to me while reading conference papers of others and where I have no idea what they are talking about :-) While trying to make vague sense of what I have in front of me a fireworks of ideas commence, all building in some way on foundations of things (pet projects) I have done in the past.

Now, how can I ensure that I periodically can enjoy the state of minimal disturbances?

Saturday, August 30, 2008

Generalized Monads --> Gonads :-)

Here is the usual (basic) definition of monads:
class Monad m where
return :: a -> m a
(>>=) :: m a -> (a -> m b) -> m b
This is very much restricted to ordinary function types!
Let's see how this restriction can be lifted...
class Category (~>) => Geonad (g (~>)) where
return :: a ~> g (~>) a
(>>=) :: g (~>) a ~> (a ~> g (~>) b) ~> g (~>) b
Since the semantics of gonads are already eminently taken, I settled for the name Geonad.

No idea whether this can produce something relevant for our lives :-)


Thursday, August 28, 2008


With lots of rejoicing, today I discovered that the base-3.0 library, to be distributed with the upcoming GHC 6.10, will contain the Control.Category type class. This is exactly what is needed to put Thrist (~>) into a general context!

Can't wait till September...

PS: It is also a nice surprise that Ashley Yakeley, the maintainer of this module is an old-time Dylan person :-)
Sometime I have to blog about the lots of developers who share a common Haskell/Dylan heritage.

Tuesday, August 26, 2008

I'd like to see, but do you want to Show?

So, yesterday I tried to implement
instance Show (Thrist (a ~> b)) where ...
... and ran against several walls. The short story is: Forget it!

The long story comes next, if you are interested.

First of all, what do we want? In my paper I define a nice thrist
[('A', 33), (65, 'a')]l :: Thrist (,) Char Char
and the Ωmega interpreter readily prints it. I wanted to do the same in Haskell too, as a pretty extension of my thrist package. I started like this:
instance Show (a ~> b) => Show (Thrist (~>) a b)
show Nil = "[]l"
I even managed to show singleton thrists (which hold just one element).
But the first obstacle was immediate: When the thrist has more than one element the hidden existential type appears, but the compiler does not know that this (a ~> x) has a Show instance!

So I hoped to extend the context to require Show (forall a b . a ~> b) that is I want all saturations of the type constructor (~>) to be Showable. But GHC does not like this syntax...

At his point I started some fundamental thinking...

Since the type of the thrist only reveals the beginning and the end, there may be arbitrary non-showable types hidden inside, even if the resulting type conveys the illusion of Showability. To wit:
[(5, id), (id, 42)]l :: Thrist (,) Int Int
We can enter such a thing, but there is no hope to show it :-(

So what are our remaining options? Ganesh has brought Show2 to my attention, I might look at it soon, but it won't solve the above problem.

We can say goodbye to (,) and define something like:
newtype Pair a b = (Show a, Show b) => Pair (a, b)
There is some hope that Thrist Pair a b can be declared as a Show2 instance.
Who knows?

PS: Ωmega cheats, of course. It frobs the Show functionality of the underlying Haskell implementation. It also prints functions as <fn>.

Saturday, August 23, 2008


It's done!

My first contribution to the Haskell community has just landed in the belly of the beast...

Overall it was a very pleasurable experience:
  • the website is well organized and makes it easy to find the relevant information,
  • the account is obtained unbureaucratically by e-mail, and the answer is prompt,
  • the upload checker provides helpful guidance how to improve the package.
The package itself is in very early stage, please do not expect a wealth of functionality :-). Also the (constructor) function names are expected to change (Cons --> :~) to better fit into the language's overall feel.

I have already included a crucial foldThrist function, which is Ganesh's idea (thanks for the feedback, btw.!). What comes now is to figure out how mapThrist could be defined. It is somewhat tricky, as there are three type parameters to deal with (let alone all the existential types inside).

One thing comes to my mind as I think about it. mapThrist could change the first parameter, i.e.:
Thrist (->) a b -> Thrist (,) a b
an elegant way to record the intermediate values of function composition.

Fun stuff.

Friday, August 22, 2008


Today I finally came around reading my way through the Creating my First Haskell Package tutorial, and have put together my first cabalized library. And I must confess that I found it really enjoyable. Certainly not more complicated than writing a makefile or adding a macport.

Btw., the said library is thrist-0.0 (what a surprise!) and it is just the embryonal first step. I already have some more stuff in the pipeline...

Could not upload it to Hackage yet because I am waiting for my account on the server.

Monday, August 4, 2008

Trotting Forward

I am in vacation in Brazil, mostly in an apartment building facing the ocean, but I had little chance to enjoy the water so far. There are several reasons for this, the most important being that Helena was sick the past week and still receives antibiotics. It is also raining a lot and is somewhat cold (24 °C) for my taste.
But I have the nights for creative work, so I picked up my recent idea and started fleshing out the design.
I succeeded in writing a QuickCheck generator for randomly creating Insert/Remove sequences steering the lifetime of a Value's def/use chain. I am able to update a given chain according to a history and check that it has still consistently set up waymarks that lead back to the Value.

I am pretty happy with the results so far.

Now it is time to enter the second phase and proactively optimize worn-off waymarks. As discussed in the other blog post the stretchy/shrinky nature of def/use chains leads to deterioration of the waymarks because Stop marks may be punched into the valid (fixed-length) clusters of Zeros/Ones. This happens at Insert events. On the other hand, Remove events may reduce clusters, thus invalidating them. So it is important to refresh the waymarks when longer islands of invalid marks are encountered. The twist is to retain O(1) complexity, which is not too challenging, but given the volatility of the chain some cleverness is needed.

I have a smallish plan how to do it: Since I expect that the volatility mostly occurs at the head of the chain (short-lived Use object getting inserted and soon removed), I intend to adjust the waymarks as far to the end as possible. So I have to keep a pointer to the start to invalid cluster immediately preceding a found valid cluster (or FullStop) and refresh the waymarks if there are enough elements present to form a valid cluster. Alternatively I could use the Prev members to walk backwards (in the C++ implementation, since these do not appear in Haskell).

But I expect that only a nice histogram of a usage statistics will help to find the optimal strategy anyway.

Friday, July 11, 2008

Draft Thrist Paper Online

I am happy to announce that my thrist paper I've been working on more than a year is now complete, and downloadable from my ad-hoc web directory. Completeness does not mean absence of errors or lack of room for improvement. If you like my paper and would like to improve the presentation please contact me. You can find my e-mail address in my profile. I am happy for every comment, though I cannot guarantee that I will pick it up. Please consider that this is still a draft so it should not be passed on and also please do not cite or link to it, as all links will break when the final version comes out.

It has been quite a fight to get this far. I started out April 2007, and the initial e-mails turned into a paper of several pages. But then life took over. My wife in final days of pregnancy needed my support, my daughter was born. Everybody who has been through such events knows that it is a different mode of living. Sleep deprivation and all that...

Also, while I have started several papers already, I have never gotten around to finish one. This was quite traumatic, and this time I had sufficient resolve to get through. Hopefully more follow.

A final word: the formatting of source code sometimes line-breaks in ugly ways. The reason is that the format may change to one column page and then all is right. This depends on the channel how the paper gets published. Updates may happen next week, but July 18 is my absolute deadline.


Monday, July 7, 2008

Clang Server?

Sometimes things happen that sound like a miracle. There is this guy, Peter, who signs up for Google SoC, with mentor and all. The project plan is to come up with a distcc-like parallel/distributed compilation scheme, only with clang, the aspiring LLVM C frontend.
He gets the deal, but the weeks run away, and no signs of any progress. Peter is missing in action.
Then, suddenly this weekend: a status report.
I did not look at the attached patch, but he promises a working compile server, no less!

Saturday, July 5, 2008

Things I hate

Each year the same tragicomedy. Filling out my tax declaration. After procrastinating the first half of the year, and seeing that everybody else already has done it. Being dragged kicking and screaming by my wife (I love her nevertheless). Sometimes I even volunteer to do the stuff.
But every time I read the forms I do not understand how to fill in the right values. The words make so little sense to me. Even if I figured it out last year, it has almost no value, I have already forgotten it. So I can only stare at the last year's form and try to find the corresponding box, which is not so easy since the boxes tend to wander around. I usually give up and opt for another messed-up evening.

And now there are the tax papers for my wife also to be done, yes, her company is new and the bureaucracy requests 12 reports a year and a final declaration. That is not enough, this all has to be made official and another two exemptions handed in till end of May.

Welcome to Germany!

I definitely prefer programming to this.

Friday, June 20, 2008

Birthday season

Last Sunday we celebrated my baby daughter Helena's first birthday. She enjoyed it to be in the focal point of the party, but the guests had their fun too. There was another boy, around 16 months old and they played together, hugging each other now and then. Just cute.

My son turns 12 mid-July and he'll celebrate twice, at home and some days later in Brazil. This event will close the string of birthday dates in my family, commencing with mine on 29. of April and about 10 in between :-)
A kind of frenzy, especially when you are looking back. And people keep telling me that there used to happen a different kind of frenzy in respective years ten months before :-P

Thursday, June 12, 2008

OpenCL / CLang / CL

The most exciting thing that happens under sleep deprivation is a kind of stream of consciousness, where the brain goes into a purely associative mode. I am sleepy now and so I can embark on some journey which I tomorrow morning surely will dismiss as utter nonsense.

So, WWDC is progressing at the moment, some under-the-hood technologies are being introduced into the OS X system that is running on the Macs out there. Two of these sound interesting: "Grand Central Dispatcher" and OpenCL. There has been some speculation that utilizing GPU power (which is 99,9% idle, unless you are a framerate-wanking gamer) could be one of the directions of the future (besides multi-cores). Nvidia and ATI both offer technologies to do this at different levels, but they are incompatible. It is hard to imagine that Apple will provide developers several different and incompatible compilers and tell them to compile their compute intensive kernels with them, bundle them up in some big blob of application and ship it untested (how many developers will buy a bunch of video-cards for testing?).

Apple likes to provide seamless solutions. They have a proven concept in their OpenGL implementation, namely a machine-independent binary format that can be JITted to a wide variety of processors: LLVM. This would make Joe Developer's life pretty easy: compile your computing kernels to LLVM bitcode, and submit a quadruple to a runtime component (bitcode, characteristics, input data, continuation/consumer of results). The characteristics could state that it is integer- or FP-heavy code etc. After deciding which target will run the code, the LLVM JIT could distill the optimal machine code for the target and invoke it on the input data. When done, the continuation would be called, and would consume the results. Add some speculative execution (start the same job in several computing resources), a good deal of caching (no need to recompile bitcode again), heuristics and profile guided recompilation. This could be some yummy sauce.

RoughlyDrafted's blog so far provides the best intro to the new technology, and helped to start my association chain. What comes next is the fun part.

When hearing OpenCL the first time, I thought, what a sucky name. It reminds me of OpenGL, but it is not a graphics language. It reminds me of Common Lisp, but that is not an option. Open Computing Language. Hmmm, all programming languages compute stuff...

There is a little gem, still unpolished, called clang. The light footed successor of gcc in corporate environments. Compiling "curly brace" languages to LLVM! Wasn't OpenCL C-based? Those letters CL appear everywhere. What is the initials of the principal LLVM engineer, again?

Friday, May 16, 2008


It did not make it into LLVM release 2.3 (let's be conservative), but it looks like my use-diet branch will make it into LLVM 2.4! I have merged the relevant changes on last Saturday (2008-05-10), and so far it has not been reverted :-)

Owen did some memory statistics and found out that the SPEC benchmark 447.dealII had a 13% reduced memory footprint with my patches. This is about what I expected, and a pretty awesome result. Despite of the algorithmic overhead that has been (knowingly) added the compilation did not really get slower, indeed some nightly testers show slightly improved cpu times. Again, this is in line with my observed slight speedup on kimwitu++ benchmark's compilation time.

What's next?

My plans are to do some more API cleanup and then go after even more memory savings which I am already drafting in my fantasies... Whether they are feasible compile-time-wise is still an open question, but I am determined to find out.

Saturday, April 19, 2008

Absolutely, Positively Crazy

My previous two posts revolved around space optimizations in LLVM. The presented technique is a bit unconventional, and paired with the confusing nature of def-use chains, it can leave one quite puzzled...

Today in the LLVM IRC channel, chandlerc wondered whether I am eliminating the Value* from the Use struct. I corrected him, that, no, it is all about eliminating User*. But his question drove a nail im my head, making me think. After all, he was thinking of the def-use chain, and I was always dealing with user-use arrays. But deeply inside it is the same problem, a.k.a. the dual one:
  • for a user-use array the User* for all array members is constant, while the defs (Value*-s) vary,
  • for a def-use chain on the other hand the defs (Value*-s) are invariant, while the User*-s vary.
I wondered whether it is feasible to use the tagging technique to get rid of the Use::Val too? After all, it is just a linked list, and I have already demonstrated something similar in my previous posts.

It turns out that it is perfectly feasible, and the rest of this post demonstrates how...
...but please do not misunderstand, I do not propose this for LLVM (yet?).

Let's recapitulate what a def->use chain is:
There is an instruction that generates a value, thus defining it:

Val = | Instr |

Then there are other instructions which consume this value, but the above instruction only sees the Use objects chained up:

| Instr |
| +---+ +---+ +---+
+---+ +---+ +---+
The X above means that Use::Next is filled with NULL. Otherwise Use::Next obviously contains the pointer to the next Use object.
So to apply waymarking we have to get away with two obstacles:
  1. There is no way to store something around the Use object (layout-wise), as we did with user-use arrays. We cannot go outside to place the Value*.
  2. Def-use chains can be expanded and shrunk dynamically.
It turns out that 1) is pretty easy, we do not store NULL in the last Use's Next field, but the Value*, with the ‹fullstop› waymark. So when we see ‹fullstop›, the Value* can be extracted from the upper bits.
So, let's assume the ‹stop› and ‹0›, ‹1› waymarks are also tucked on the Use::Next fields. Like usual, given any Use* we can pick up the digits, and at the ‹stop› we convert the digits to a Value* and we are done. This is nice and dandy, until we remember 2). It appears a bit more tricky. Obviously the insertion/deletion will remove waymarks or add new ones. Our algorithm may begin producing bogus Value*-s. So is there any solution to this problem?
I believe there is. I have no proof but I conjecture that each destructive operation should simply place a ‹stop›, and when the algorithm is changed to only accept 32 digits between two ‹stop›s, then we are on the safe side. The corruption gets detected, an uncorrupted segment can still be found, downstream. The corruption can be repaired later (e.g. when traversing the def-use chain).

Well this is not a very efficient way of doing things, so it probably only pays off when space is extremely limited and compilation/optimization speed is not paramount. But in these applications an 8-byte (down from 16/12 bytes) Use struct is definitely tempting.

Monday, April 14, 2008

Use-diet Update

As of r49698 the use-diet branch finishes the testsuite (about 2000 tests) without regressions. Actually the User* member of the structure is not eliminated yet, but instead deliberately set to NULL in various places, and if present, I assert that it equals the computed value.

It was quite a fight to get thus far, but it is done! Time to open a good bottle of wine...

The next days will be devoted to cleaning up the sometimes messy details, doing some statistics and cautiously trying to merge back my work to trunk.

Wish me luck.

Sunday, April 13, 2008

LLVM Data Structures, and Putting Use on Diet

There is a good deal of refactoring going on in the LLVM universe, to wit Dominic's renaming of LLVM*Builder to IRBuilder with assorted simplifications...
The train is moving fast, getting on board is harder, but rewarding. LLVM does not commit to API (or less even, binary) compatibility, so meaningful changes get in swiftly and without much bureaucracy. A simple mail, warning people of the change in advance, and when accomplished, instructions how to do the conversion, are enough to make folks happy.

There are reforms going more than skin deep, too. Clang is is getting a datastructure rewrite with nice gains, and myself is about to reduce the size of the Use struct by 4 bytes, from 16 to 12 (on a 32bit system). The 25% savings is nothing to sneeze at, considering that this struct is the most frequently allocated one in LLVM. And now that I have the functionality basically implemented on a branch, I can say, the idea is working! Consequently, I have the courage to blog about it :-)

So what is this Use-diet all about? The Use struct is the home of the pointers to all Values that an Instruction references. But each Value has to track all of its Users (i.e. the Instructions), and Use provides forward and backward pointers to chain those up. These are the essential 3 pointers. But why is Use 16 bytes? Because in some situations it is important to get back to the User of the referred Value. So it is 4 pointers in total.

Seemingly this is how it works™ and there is nothing that can be shaved off. But wait! Don't all Uses belonging to an Instruction come lined up in a contiguous array? What if we could mark the last Use specially and put a pointer to the Instruction behind it? Or even allocate the Uses immediately in front of Instruction (memory-layout wise)?

These were the first ideas how my brain-storming with Sabre began. After several exchanged emails, we settled on a concept that is IMHO really beautiful. We use 2 bits (the least significant ones, which are always zero, normally) of one of the pointers in Use to implement a serial line-like protocol of waymarks that guides us to the end of the array in some reasonably few steps. I will not detail the algorithm, since it is documented elsewhere, I will only say that there are four kinds of waymarks: ‹fullstop›, ‹stop›, ‹0› and ‹1›.
  • Fullstop means we are at the end already,
  • Stop means begin gathering digits, or if already done so, convert them to an offset, that brings us to the end,
  • 0 and 1 are the binary digits to be picked up.
It is clear to see that for small arrays there are only a small number of operations needed to determine the end of the array. The complexity of the algorithm is O(log N). So we do not really have to get concerned with, say, 10000 predecessors to a PHI node :-). The description even contains a Haskell snippet to encode and decode such offset information.
For reference I shall present it here:
> import Test.QuickCheck
> digits :: Int -> [Char] -> [Char]
> digits 0 acc = '0' : acc
> digits 1 acc = '1' : acc
> digits n acc = digits (n `div` 2) $ digits (n `mod` 2) acc
> dist :: Int -> [Char] -> [Char]
> dist 0 [] = ['S']
> dist 0 acc = acc
> dist 1 acc = let r = dist 0 acc in 's' : digits (length r) r
> dist n acc = dist (n - 1) $ dist 1 acc
> takeLast n ss = reverse $ take n $ reverse ss
> test = takeLast 40 $ dist 20 []

Printing gives: "1s100000s11010s10100s1111s1010s110s11s1S"

The reverse algorithm computes the
length of the string just by examining
a certain prefix:

> pref :: [Char] -> Int
> pref "S" = 1
> pref ('s':'1':rest) = decode 2 1 rest
> pref (_:rest) = 1 + pref rest
> decode walk acc ('0':rest) = decode (walk + 1) (acc * 2) rest
> decode walk acc ('1':rest) = decode (walk + 1) (acc * 2 + 1) rest
> decode walk acc _ = walk + acc

Now, as expected, printing gives 40.

We can quickCheck this with following property:

> testcase = dist 2000 []
> testcaseLength = length testcase
> identityProp n = n > 0 && n <= testcaseLength ==> length arr == pref arr
> where arr = takeLast n testcase

As expected gives:

*Main> quickCheck identityProp
OK, passed 100 tests.
Btw., QuickCheck is awesome!

So where are the uses of this algorithm outside of LLVM?
There is not much of thinking needed to generalize the array to other data structures, which may permit mutating of the node contents, but disallow insertions. Linked lists are an example since one can only cons up stuff to the head. Doubly-ended arrays with fast size() operation (given a pointer to one element) can be implemented if there is another pointer in each node that we can use for storing waymarks to the start of the array. Deques also could work like this, but they too, need to be fully built up before putting in the waymarks.

This all reminds me of cons-hashing but is really a more powerful concept. Let's call it waymarking! And then let the garbage collector put in the waymarks for us...
Down with O(n) complexity on linked-list's length operation!

Monday, March 31, 2008

ETAPS Budapest

I am in Budapest since Saturday, and I love it. The weather is excellent, I am in the middle of some very nice people and I can do what I like most: brainstorming cool concepts.
At a satellite event yesterday (DCC - Designing Correct Circuits) I met Tim and after the workshop we actually came to chat a plenty on Omega. I very much felt like a satellite, because I did not participate at the conference itself, just hanging around the lobby and the workshop rooms, using the "free" WLAN and thinking about what to do next.
Today then was our first Omega hacking day. Building on the googlecode repository's trunk version Tim checked in everything he had on his laptop, and now we can do concurrent development. Both the documentation and the release management toolchain have been overhauled and checked in, along with a testsuite.

This is already much more than what I dreamed of...

Tomorrow we'll go thru the issue list, squashing as many bugs as possible and I might get back to the LLVM generation branch. Another interested student will join us, an Hungarian guy called Tamás.

This is finally the place to thank Dr. Horváth Zoltán, who has very kindly offered us rooms in the university to have our hacking sessions in complete peace.

Tuesday, February 5, 2008

Embeddings, Part Two: Monad --> Thrist

Here comes the second part of my promised embeddings, this time I'll embed any monadic chain (corresponding to the do-syntax in Haskell) into a thrist and provide a semantics. Later I'll try to discuss some convenience features and try my luck with mfix (and the mdo-notation).

As in my post dealing with arrows, we have to define an adapter GADT:
data Monad' :: (* -> *) -> * -> * -> * where
Feed :: Monad m => m b -> Monad' m a b
Digest :: Monad m => (a -> m b) -> Monad' m a b
The semantics is a bit trickier as in the arrow case, but let's find out the type signature first...
Feed sounds like an introduction of a new monadic value, we would like to interpret it as the (>>) connective. Digest on the other hand transforms, and thus corresponds to (>>=). Both expect an already formed monadic value, so we can state:
recoverM :: Monad m => m a -> Thrist (Monad' m) a b -> m b
The effect of an empty thrist is trivial:
recoverM mon Nil = mon
And the other two connectives turn out to be the only sensible way to do it:
recoverM mon (Cons (Feed m) rest) = recoverM (mon >> m) rest
recoverM mon (Cons (Digest f) rest) = recoverM (mon >>= f) rest
Now, we can build up chains:
*Embeddings> :t Cons (Feed getChar) $ Cons (Digest (return . ord)) Nil
Cons (Feed getChar) $ Cons (Digest $ (return . ord)) Nil :: forall a. Thrist (Monad' IO) a Int
and execute them too:
*Embeddings> recoverM getChar (Cons (Feed getChar) $ Cons (Digest (return . ord)) Nil)
Loading package haskell98 ... linking ... done.
The first 'G' is consumed by the getChar invocation that is given to recoverM, but its result is ignored. The second is having an effect too, and its result is used to obtain the ASCII value. The overall monadic action is then run by ghci, printing the 71.

We have converted >> getChar >>= (return . ord) into a thrist and got it back by means of recoverM! And this works for any monad.

Two shortcuts come to my mind, they can be modelled by adding two simpler variants to the GADT:
Feed' :: Monad m => b -> Monad' m a b
Digest' :: Monad m => (a -> b) -> Monad' m a b
The semantics should be clear, I leave the extension of recoverM to the reader.

Now what about the value fixpoint mfix? Can we dress it in thristy clothes? Frankly, I have no idea, but I would be happy to hear from you. It is late, I need sleep now. Bye.

Thursday, January 31, 2008

Embeddings, Part One: Arrow --> Thrist

Dan Weston asked me how the embeddings of arrows and monads into thrists would look like.
Took me some effort, but I think I have found something that appears to be satisfactory.

This time I shall prove my claim that in Haskell any Control.Arrow instance can be rewritten as a Thrist data structure. I shall also provide a semantics for the resulting data that shall recover the original arrow. Let me recapitulate the Thrist definition before diving in:
data Thrist :: (* -> * -> *) -> * -> * -> * where
Nil :: Thrist p a a
Cons :: p a b -> Thrist p b c -> Thrist p a c
Let's begin with the embedding part. Since the members of thrists are constructors of a two-parameter GADT, we have to construct the Arrow' GADT first. Here is the adapter:
data Arrow' :: (* -> * -> *) -> * -> * -> * where
Arr :: Arrow a => a b c -> Arrow' a b c
First :: Arrow a => Arrow' a b c -> Arrow' a (b, d) (c, d)
Arr takes any arrow and wraps it. This constructor is responsible for the actual embedding and it is easy to see that it is completely general.
First is responsible for defining a transformation on a pair's first component. Clearly this corresponds to the Arrow type class's first member. I refrain from doing the same to the other functions in the Arrow type class. You may scratch your head now, asking whether I have forgotten to handle the >>> member... The answer is no, I'll come back to this later.

Let's see how this works out in the practice...
t1 :: Thrist (->) Char Char
t1 = Cons ord (Cons chr Nil)
This is just the plain chaining of functions as a thrist. Here is the Arrow' embedding:
t2 :: Arrow' (->) Char Int
t2 = Arr ord
We make a thrist of length 1:
t3 :: Thrist (Arrow' (->)) Char Int
t3 = Cons t2 Nil
This is bit more involved, adding a constant second pair component:
t4 :: Arrow' (->) a (a, Int)
t4 = Arr (\a -> (a, 42))
Now it is time to form a longer thrist:
t5 :: Thrist (Arrow' (->)) Int (Int, Int)
t5 = Cons (Arr chr) (Cons t4 (Cons (First t2) Nil))
So, this corresponds to what? Intuitively, it could mean
chr >>> (\a -> (a, 42)) >>> first ord.
The Cons data constructor assumes the rôle of >>>.

To obtain a meaning at all, we have to define a semantics for it! Here it comes, and allows us to recover the embedded arrow:
recover :: Arrow a => Thrist (Arrow' a) b c -> a b c
recover Nil = arr id
recover (Cons (Arr f) r) = f >>> recover r
recover (Cons (First a) r) = first (recover $ Cons a Nil) >>> recover r
Some people call this an (operational) interpreter.

Finally, a little test:
*Embeddings> (recover t5) 55
Loading package haskell98 ... linking ... done.
Cute, isn't it?

PS: To embed monads, you can try embedding them via Kleisli to obtain an arrow and then via Arrow' to construct a thrist. Alternatively wait for a future post here :-)

PPS: For running the above you will need these preliminaries:
module Embeddings where
import Prelude
import Control.Arrow
import Char