Numbers: SK, Gödel, and Catalan

We've all heard about the lambda calculus and how we can encode the factorial function in it, and this all makes sense to us assuming we know Scheme (or Lisp/ML/Haskell, depending on your age and geographic history). However, can we drill down the essence of computation even further? In particular, given a few specific lambda-like functions, can we still encode everything? In comes the SKI combinator calculus, which show that with just three, yes, we can. One of the first things we learn when using them is that the third combinator, "I" for identity, can actually be encoded via "SKK" or "SKS". If we do our homework a bit further, we'll even see that we only really need one iota combinator for a sufficient calculus!

In our intro to semantics course, Dana Scott started with all the traditional Church/Kleene encodings (booleans, numbers, recursion) - except using SK combinators. Two novel thoughts came up for me out of it:

First, naming: S for Slide and K for Kill. This makes sense when you look at the rewrite rules:

S x y z => x z [y z]

K x y => x

Further in that spirit are Chris Rathman's combinator birds.

Second, complexity:

In the spirit of Iota, we can reduce are representation further. We might want to use some sort of Gödel numbering system to more concisely represent terms in our language - every term should correspond to a unique number. Something like trinary would be obvious in our system (or binary, if we encode I). Ideally, we would be able to do it from within our own system, constructively shown by writing a combinator that takes another and returns that one's unique number. We can break down the problem: as we are using SK combinators, whose terms correspond to binary trees, we want a function G: Z+ * Z+ -> Z+ to determine the label of a composition whose constituents have already been converted. To start, we assume 0 corresponds to S, 1 to K, and 2 to I. Using Gödel's original strategy of prime factorization, we would have:

G(a,b) = 2 + (2^a) * (2 * b + 1)

where a, b (lowercase) are the unique numbers for A, B, (uppercase) and thus G(a,b) uniquely represents A[B].

We can encode this formula in SK combinators easily - once we have iteration, addition is iterating adding 1, multiplication is iterated addition, and exponentiation iterated multiplication - but the spacing between numbers is horrible so our representation of terms blows up exponentially.

So, we need a lossless numbering scheme. We can just say that we project the sparse scheme in order, but that's hard to write using combinators. Instead, we can use a grid: the X and Y axis represent Gödel numbers, the coordinate (G(A),G(B)) represents the combinator A[B], and the trick is how to write a combinator F(G(A),G(B)) = i without missing any numbers. Bad memories of Cantor diagonalization emerge, and so squares and triangles are instant choices. I prefer squares due to ordering (shouldn't go into details as this was an optional homework problem).

I wrote a closed form representation of F using easily attainable addition, exponentiation, max, and comparison operators. But, we want to make sure we're good: for a given length combinator string, what is the bound on the corresponding Gödel number? The general intuition for the square algorithm is that all smaller length terms are encoded (all the inner squares), so for a length

*n*string, each character can be one of S, K, or I, and that the string can be split up into any binary tree. Thus, we have:

G(E) is <= 2 + (3^n) * C_(n-1) where n = len(E)

C_n is a Catalan number: the notion of splitting a string into all possible binary trees is surprisingly common and powerful.

Edit: The formula should really be the sum of Catalan numbers up to that point, which probably can be reduced further.

I'm still puzzling about introspecting on argument structure: how far can one go without hitting the halting problem?

The Future of Mobile

We were over at Mozilla last week for their all-hands meeting (some awesome t-shirts and hardcore developers there) and I gave my bit on achieving implicitly parallel scripting by using domain specific abstractions and component based design, but a looming question was, "why parallel?" One of our traditional arguments has been that manycore is the way to go for the 5-15 year period when computing on a mobile device due to power: you don't want to drain the battery or burn the user. An alternative strategy making the rounds is seen with a flurry of proxy-based computing mobile browsers. A device need only handle basic interactions for the browser, and a proxy server can be used to do as much as to totally emulate the environment to as little as to better encode or reformat the transmitted content. A big question is, in terms of security, power, latency, and connectivity - is this realistic? We've been trying to put together some numbers on that as it's not actually too obvious, and gets hazy as you go from 2, to 5, to 10 years. We'll probably have a post about it as our sophomore one on the parallel browser blog soon, so stay tuned...