Sunday, October 26, 2014

gdiff – Polymorphically

In the few last months I've been busy coming up with techniques mastering the gdiff Haskell library, even for use cases it was not originally designed for.
This is mostly a brain dump of some approaches I have learnt while tinkering and trying to solve certain problems.

1) Polymorphic diff: Eelco Lempsink writes in the conclusion of his 2010 thesis «Furthermore, we can not encode polymorphic datatypes such as lists, but need to write specific encodings for each type we want to include». In the library documentation he says «It might require another master thesis project to solve this». However in the last days I developed a trick, which does allow me to describe polymorphic data types, such as [a] or Maybe a. I just needed to add these lines:

in the family GADT:
  ListNil :: List Fam as → Fam a as → Fam [a] Nil
  ListCons :: List Fam as → Fam a as → Fam [a] (a `Cons` [a] `Cons` Nil)

then define a Type instance:
instance Type Fam a ⇒ Type Fam [a] where
  constructors = head [Concr (ListNil cc) | Concr cc Concr cc ← constructors] : [head [Concr (ListCons cc) | Concr cc ← constructors]]


What is this doing? It picks two random constructors from the underlying data type and wraps them with ListNil and ListCons. We usually ignore what is wrapped, with one exception: in decEq one needs to ensure that the constructors' arguments are compared too, otherwise one cannot finish the type equality proof.

2) Non-moving pairs: In the usual textual diff algorithm the lines may move around a bit to make room for insertions, etc. This is normally the case for gdiff too. I have seen gdiff reusing values (with Cpy) when comparing (True, False) with (False, True). But sometimes this is not desired at all. I figured out that "inlining" the leaves into the pair's description (i.e. instead of (a `Cons` b `Cons` Nil) writing appendList on the field decompositions of a and b does the trick.

3) Location types: I found a way to equip Abstr constructor descriptors with type class dictionaries. It is similar to the lifting approach shown under 1), but much more involved.
The idea is to wrap a location-aware descriptor with a descriptor that is location-blind, i.e. that hides the location parameter: Loc' :: KnownNat n ⇒ Fam (Loc n Bool) …
Then we need the wrapper:
Hidden' :: KnownNat n ⇒ Fam (Loc n Bool) ts → Fam (Hidden Loc) ts
Giving the Type instance for the latter is tricky, as Abstr alone has no provision for dodging the class dictionary into Hidden', so I had to write a locAware smart constructor to syphon it through the Hidden' wrapper.
This also stressed the compiler to its limits in the advanced PolyKind-ed approach I needed, so I filed a GHC bug #9725. There is also some code showing how the Hidden Loc is unwrapped and the dictionary-passing functions is installed into the Abstr.

4) Monadic actions coming out of patch: Here another wrapping approach is needed, but this time we need a sum type for diff and patch so that we can enter with a pure value at the Left and obtain a Right action back. This is the most involved approach. I can probably blog another time about it.