Saturday, July 28, 2007

Taming PSNormalizer

For distilling a PDF on the Mac I had to unlearn almost everything. Or at least I have to use some more tools, while ignoring others. What I shall write down now is the product of one night of intensive research by googling with lots of trial-and-error.

So, producing the .ps with lout was easy and just like on Solaris. But there is no Distiller in my setup, instead I normally use Preview.app. Double-clicking on the .ps file in the Finder usually is sufficient. Not this time, though, because the Latin Modern fonts were missing, and Preview.app substitutes Courier for them. Clearly a suboptimal solution.
Dropping the .pfb and .afm files into the Fonts folder (personal or system Libraries/Fonts) only had the effect of locking up Preview.app with every job. This is what other people also report in support forums. I removed the font files again.
The solution must be including the font into the .ps file. There is a small utility called includeres in the MacPorts psutils package. Unfortunately the script did not run out of the box, I had to edit it to invoke perl by full pathname (#!/opt/local/bin/perl). But then I had to symlink LMRoman9-Regular to lmr9.pfb for this approach to work. Opening the resultant pimped up .ps file with Preview.app resulted in the well-known hangup. Hmmm, .pfb is apparently not palatable for Preview.app.

So I began digging deeper. It looked like there is a PSNormalizer entry in the /tmp/ directory. It seems to be the bastard child of Distiller, licensed to Apple from Adobe. I figured it also has a command-line interface pstopdf too. PSNormalizer seems to be a private system framework, with a certain directory structure. After finding it on my HD, I could examine its contents. There is a Resource/ directory with fonts/ in it. I dropped my .pfb there. Hangup. At least I knew now that I am on the right path. PSNormalizer looks for fonts in Resource/fonts/ and after that in ~/Library/Fonts and the global /Library/Fonts.
Resource/fonts/ has two files in it, one of them being Courier. Looking at the dump, it seemed very close to .pfb files, but all ascii. It must be a .pfa file! From here it was easier to proceed:
  1. grabbing a pfb2pfa converter from CTAN (part of ps2mf)
  2. with a small script converting all .pfbs to .pfas
  3. dropping these into Resource/fonts/
  4. redistilling
This time it worked! But it is tiresome to invoke Preview.app all the time using the mouse, so I am using pstopdf now, which is a perfect substitute for Distiller. Also I have my .pfas in /Library/Fonts/, so everybody on that Mac has access to them.

Nevertheless I could relax at this point, and do some experimentation. Here are some things I stumbled upon:
  • PSNormalizer/Startup/ can contain arbitrary .ps files that are run before each distillation job starts. This can be useful for customization.
  • Resource/startupNORM.ps contains some interesting code and can probably also be modified.
  • Using includeres with .pfa fonts also works.
  • The PSRESOURCEPATH environment variable does not seen to affect pstopdf.
  • pstopdf has two brothers, pstopnm and pstops. I still have to find out what they are good for.
  • And there is more: ps2epsi and many other commands starting with "ps" come from the Ghostscript MacPorts package.
All in all, I feel much better about the PDF-workflow on the Mac now :-)

Friday, July 27, 2007

Taming Distiller

I happen to use a Mac. This makes me pretty comfortable with viewing and creating PDF files. My favourite typesetting program, lout, can also output PDF, but it is rather limited, especially when it comes to graphics. This is why I normally go the longer (but recommended) route via PostScript files.

Today lout v3.36 arrived, and I gave it a try:
Building on Solaris was a breeze, as always. Thirsting for a challenge, I downloaded some Latin Modern fonts (refined versions of Knuth's original Computer Modern fonts) and tried to get them running under lout.
It was not easy.
Not being a font expert, first I was overwhelmed by the .afm .pfb files from the CTAN site. I did not know where to put them, so that I simply dropped them beside my .lout file. It did not work, of course. I had to pass "-F ." to lout to find them. (Lout is only interested in .afm, font metrics, files.)
I also had to prepare a @FontDef database, but there were enough examples in the web.
After a lot of googling I figured out the PSRESOURCEPATH environment variable. It must have this format: "path1:path2::" and is used for searching diverse PostScript resources. Using it, I could stash my fonts away in a different location.

There is a handy utility I stumbled upon, makepsres, which built up the font inventory inside my newly created /home/ggreif/psres folder. I desisted to fill in the mandatory PSres.upr file inside it after three trials in vain.

So, lout was happy with lm.ld and "-F /home/ggreif/psres" and produced a pretty .ps for me.
Distiller, on the other hand followed PSRESOURCEPATH and PSres.upr, converting the .ps into a .pdf. So far, so good. (There was still this pesky "unsetenv XUSERFILESEARCHPATH" to do, but I shall spare the details.)

After understanding the Solaris workflow, I had some hope to successfully repeat this experiment on my Mac too. But that is a story for another day...

Monday, July 23, 2007

Endo at end?

The ICFP contest has ended three hours ago. In his blog, Johan is optimistic that Endo can be saved. Looking at the scoreboard's 16+ rankings, this seems unlikely :-/

DylanHackers (the team I contributed to in some previous contests) did not do so well this time. I dropped off in the first night, reaching a stack overflow error when parsing the supplied DNA's first pattern:
13351PCCPFFP
13361PIIPICP
13371PIIPIIP
Stack space overflow: current size 8388608 bytes.
Use `+RTS -Ksize' to increase it.
13381PFFPCCP
13391PIIPICP
13401PIPIIPF
13411PIIPIIP
13421PIIPICP
Well, at least I managed to parse 1% of the DNA! (The number in front of the RNA fragment signifies the byte position where it comes from the DNA.) Ωmega, being an interpreter on top of Haskell has no efficient means of controlling stack usage, which is consumed quickly, even if my parser was written in tail-recursive style. Hmmm. I did not even bother to submit the low hanging prefix that was published in the task itself. I am really curious if any team reached a 50%+ survival probability.

In any case the task was really hard this year. On the positive side I saved myself from some frustration and had a wonderful weekend with my family. We celebrated my son's 11th birthday in a game factory and my brother visited us with wife and his two children. Sometimes you win by losing...

Tuesday, July 17, 2007

Ωmega Tutorial Posted

Finally there is a 67 page Ωmega introductory text. Well, actually it is a delta to Haskell, so the audience is Haskell (and perhaps OCaml) savvy folk, but at least one does not need to collect all the information from the scattered papers.
From the first glance I had at it, it describes all the new features neatly, although sometimes the old "#n" notation sneaks in (the new notation is 1v and 1t, for values resp. types).

I guess this is a live document and Tim together with Nathan will keep it current, cleaning up the few anachronisms as they go.

I am using Ωmega more than a year now, and can only say that it is a remarkably powerful system. (I shall come back with a more detailed and subjective analysis some other day.)
The thing that fascinates me most, however, is the astonishing stability of this product. Being essentially a one-man show, Tim corrected dozens of errors, that I mailed him, or filed in the new issue tracker at googlecode. It is really hard to find any more serious bugs in the core system. This is also a consequence of Ωmega being written in Haskell, so it inherits all of Haskell's blessings. I wish I were a good Haskell hacker, then I would fix the bugs myself :-/

The only drawback at the moment is that it is an interpreter, and this fact makes it a bit slow. So a performant version is needed. Then one can think of bootstrapping the interpreter/compiler. Ωmega written in itself would definitely be the dream of formal methods coming true!

Monday, July 16, 2007

A Compiler with a Good Clang

You probably already know, Apple has open-sourced its new front-end for C-like languages, that has been in development internally for months. Its name is "clang" which is probably a contraction of C LANGuage. This name might be related to the email handle of one of its creators Chris Lattner, who is clattner at apple. Its other father is Steve Naroff, who introduced it at the LLVM-developers' meeting back in May.

Of course, nobody writes a C compiler in six months, so this product is not mature yet, its main omission being structure handling, and there are many loose ends to wrap up. Being open source now, with a developer-friendly BSD license, I offered to fill in a bit-sized hole in the implementation, and chose _Complex multiplication and comparison operators. The parsing part was already done, AST handling too, only code generation to LLVM IR remaining to be done. I used the analogy principle (addition was already there) and came up with a patch in 30 minutes. After some IRC conversation with Chris and another round of tweaks, I got green light and checked in my changes accompanied with a test. That's it. This is open source at work. Trust, ease of implementation (I barely wrote 30 lines, because of the support for assembling fragments of LLVM is pretty good) plus rigorous code review and finally testcases. There is not much room for bugs and malicious code to hide out.

It has been a pleasurable experience. I know C++ code that is a nightmare in comparison. If the fathers of clang continue as they started and let the many external contributors fill in the missing pieces, this compiler will definitely be one with a good "Klang" (sound, in German).

Sunday, July 15, 2007

Writing Papers is Hard

It is some months ago that I started with my paper
"Thrists: Dominoes of Data",
which aims at generalizing function composition and in my opinion is a nice vehicle for other programming tasks too.
It has only nine pages, and the really troublesome areas are still in front:
  • writing the references
  • fine-tuning the text flow
  • stylistic enhancements
I would like to publish it for some serious conference or journal of course, ASAP. This means I have to respect their style guide, which can be non-trivial, as they only provide LaTeX stationery, and I am using lout for my typesetting. Looks like I have to transcribe them :-(

Anyway, today finally I arrived at a point that I can offer an early draft to the interested public.
The feedback so far appears positive, which makes me spin. Thanks to all who took the trouble of reading through a very rough draft.

Overall, one can put an infinite amount of work and time into such a short paper, which makes me marvelling at people who seem to produce high-quality papers at a seemingly monthly rate. They must have some secret trick I do not know -- yet!

Friday, July 13, 2007

Heating up for ICFP Contest

Today I started to analyze the ICFP Contest organizers' blog in detail. The pictures remind me of MiB, the commentaries suggest that some fictive language must be decoded.
Fortunately I speak some and learned even more when young, so maybe this is a advantage, we will see.
Anyway, I shall register as a team captain again, maybe I can organize a team here locally. Last year's experience has shown that the contest does not need a geographically concentrated team, but it surely helps.
After last year's pretty good result (place #101) we have a record to break!

Wednesday, July 11, 2007

The HaL2 Harvest

Yesterday's Haskell in Leipzig was an interesting and entertaining mix of demonstrations of Haskell programming in real life. It was definitely worth my long travel.
The organization of the meeting was excellent, the venue somewhat esoteric for a conference, but could not have been selected better.
Of course I preferred some talks over the others, all in all I returned with some new clues.

The best thing is that the talks and subsequent discussions have been videotaped and should show up on Youtube soon. This way everybody can participate.

Personally I managed to seed some attendants with an early draft of my paper on thrists, I am really curious and anxious about the feedback I get. Already in the car I established a connection between categories and thrists, objects being Ωmega's type level entities and morphisms being the values that populate the first parameter to the thrist. It was very helpful to have a student in my car whom I could abuse as an idea-sink :-). On the return (me being really sleepy) he payed it back.
At the meeting this idea got consolidated in my discussion to Mr. Zirnstein, who suggested free categories. I am already in the process of integrating this feedback into the paper.

Monday, July 9, 2007

More on Tail Merging

In an earlier post I speculated that factoring out Cat combinators other than Dup would just be a matter of copy-and-paste (I used the euphemism boilerplate back then). It came to me as a shocking revelation that the Ωmega interpreter did not agree. It asserted that
type _d is equated to type _e
which is inacceptable. The error message was not exactly like this, but very similar. It took me some time to understand it. First I wrote an even shorter function than the one-liner that caused the error. It only happened when I recombined the two shortened If legs (that is, after removing a Print from each).
This piqued my curiosity and I made an excursion into Ωmega's interactive typecheck mode, just to discover that the two If legs ended with different stack configurations!

This put me over the top, and I could come up with an example immediately:
If [Push 42, Print] [Push True, Print]
is a case where the Print cannot be removed without violating the invariant that the exit stack configurations at both If legs must be the same. In this case the yes-leg would have an Int at TOS and the no-leg a Bool. Since I have to cater for the general case, the removal of a Print is unsound. Ωmega discovered this.

There are two lessons to be learned from this incident, namely:
  1. If you see a mysterious error (message), simply provoke the same with a smaller program, and
  2. Ωmega's error messages need to be improved in a way that somehow the piece of code is indicated that gives rise to the erroneous types.
In the meantime I have an idea how to find a way to remove that Print anyway. The key is abstract interpretation of the two If legs, to construct a proof that the stacks have the same shape immediately before the Print. Given this proof the typechecker can be persuaded to allow the recombination of the shortened legs. But to implement this idea there are many more hours to be spent hacking. It will surely be worth blogging about.

Sunday, July 8, 2007

The Grief with "Greif"

Are you American? So do you see a difference in the two words of the title beginning with G?
If not, you are in good company. Let me explain...

My family name (last name) is Greif, it is pronounced with a vowel like in Friday. It is German, and means griffin.

I first noted that the spelling of my name is causing difficulties for Americans, after my colleagues returned from a WWDC in the late '90es. When they mentioned their affiliation while talking to a Metrowerks compiler engineer, he responded something like
"we know a guy from the same shop, his name is Gabor Grief."
They surely remembered my name because back then I bombarded them with many obscure C++ template compilation bug reports, actually causing them a considerable amount of grief. (They were really good in fixing them.)

Since then this is happening regularly. And I always explain patiently. Now, thanks to this blog, it will suffice to provide an URL :-)

As an aside, after spending my vacation at Fuerteventura in 2001, back on the airport I bought a bottle of Lanzarote wine with the name El Grifo. I have sworn to myself to only open it when I get a son. It took 6 years, and Helena came. The wine still tasted good, we enjoyed it with the family.

Saturday, July 7, 2007

Tail Merging in Ωmega

Today finally I got around improving my Cat-like language optimizer with a new twist:
I can finally factor out common instructions from the end of the two If legs and prepend them to the If's continuation. This wouldn't be an interesting achievement if it wouldn't be written in Ωmega's type-pedantic fragment. To wit:
  • tailMerge False [If [Dup]l [Dup]l, Print]l (revThrist' [Dup]l) (revThrist' [Dup]l) [Print]l
  • becomes: [(PopN 1v),Dup,Print]l : forall a (b:Prod *0).Thrist Cat [Bool,a; b]sh [a; b]sh
If you think the above lines are in Chinese, you can stop now. If you discover that the If has completely been eliminated and replaced by popping a boolean and duplicating the TOS, then you are pretty good in reading obscure code.
The funny thing is that all the types of values that are on the stack are painstakingly tracked, and the transformation is automatically type correct in Cat because it is type checked in the metalanguage.

I am sure that similar transformations have been proven type correct (and thus pretty correct) before, however I would like to know whether this particular optimization has been already formalized using expressive types.

The big surprise for me was the fact that writing down the transformation only needed a dozen lines of code, support functions included. Of course, my proof of concept only matches on Dup instructions, but the rest is basically boilerplate.

The second surprise is, that so far I did not need a single theorem declaration, an indication that either I am still in very shallow waters regarding the power of Ωmega's typechecker, or my Thrist construction is so clever that it provides enough type hints to the narrowing engine to deduce all types.

I am in the process of writing a paper on all this, and - as always - need some encouragement. Tail merging will definitely belong to the beef part of it.

Friday, July 6, 2007

Haskell in Leipzig

Next Tuesday I shall visit HaL2 in the beautiful city of Leipzig, again. Back in December I already attended, and it has proven to be an excellent platform for encountering members of the functional programming community. Living in a university city myself, I sent a mail to a local Haskell tutor and offered to give a lift to interested people.

To my amazement a student did answer my message and so I won't drive the three hours alone (plus another three backward). It will be a long day, and the last time when I attended with a colleague it has proven very good to have somebody to talk to.

And I expect there will be plenty to talk about. Johan Jeuring will give a short talk, and though I do not expect him to chat about the ICFP contest's secrets, he well may give hints :-)

The functional programming community is sadly very weak in Germany. All the good people fled the country and are doing research in anglophile countries (UK, US, even Singapore) only few universities are actively supporting FP, such as Aachen and Bonn. In any case some serious networking is needed to spread the acceptance of Haskell in this part of the World. Many professionals would love to do a Haskell job, but there is simply no possibility. Small companies are the most likely ones to venture in the Haskell experiment, the big ones are extremely cautious and full of fear. Security relevant applications could be covered first, and when the productivity and maintainability gains are becoming obvious FP could expand into the classical areas of machine steering and automotive software. Germany has a lot of these.

I often use the metaphor of "building pyramids with toothpicks" when some gigantic project is started in Java or (horror!) C++. Who has never written a line of Haskell or Dylan code will never understand this metaphor. Pity.

So what can we do? I am trying to help others who bring people together. We badly need self-confidence and some pioneering folks who stand up and say: "let's do this task in FP for half the cost and 1/10th of the bugs of the conventional approach, even if it takes the same time in the beginning". Who shoots first?

Thursday, July 5, 2007

Committed to LLVM

Yesterday I got accepted to the inner circle of LLVM developers by becoming a committer. For now I have very humble plans, such as periodically plowing through the sources and sanitizing a bit here and there. This may change to a more active role, but probably not in the form of contributing to llvm-core.

At least version 2.0 already does contain some stuff from me (improved memory efficiency).

My first contact to LLVM happened when I started a new backend for the d2c compiler. I found out that the Dylan language and LLVM would be a good match, and that d2c was modular enough to nicely marry these two worlds. I did produce some llvm output, but got distracted soon and the project never became functional. Ah well...

Wednesday, July 4, 2007

How to start?

There has been quite a lot movement in my life in the last 2 years, which culminated almost 3 weeks ago with the birth of my daughter Helena. (Needless to say that I am the proudest father in the world :-)
So where I am standing now? I have a pretty happy family life, I am feeling the rewards of my professional career and my hobbies give me the self esteem that the researcher is still alive deep inside.

And what is missing? I miss the time to do more of all this. I also missed a way of articulating my thoughts, sketching my dreams.

This blog could fill in the latter gap (at the cost of losing even more time ?!?).
Sometimes I need external impulse to drive me in a certain direction. What I put on these pages may motivate others to demand that I implement a cool idea or think harder about some problem.

Even if it just remains a reminder to self, it may be worth it.