Jun 24, 2010

If Cantor Were A Programmer...

When you take math in university, one thing you'll probably come across is the concept of countability. This means that given a set, you can find some way of assigning a natural number to each element of that set - you can count the elements in it.

This guy Cantor discovered in the late 19th century that there are different types of infinities. He did this by analyzing various sets of numbers and discovered that certain infinite sets are in fact "bigger" than other infinite sets. Which is kinda weird, but it works.

I thought it might be fun (my definition of fun is apparently kinda weird) to see what would happen if instead of a mathematician Cantor were a programmer, and what his proof might look like.

Let's take a hypothetical language called C∞, which is kinda like C except that it runs on our magical computer that has an infinite amount of RAM, and thus the integer types in C∞ have infinite precision.

In C∞, we say a type T is countable if we can construct a function like this:
T foo(unsigned int n)
Where foo() satisfies:
1) For every possible unsigned int value n foo() will return a T - this means no exceptions, no crashes, etc.
2) For every possible T value t, there is exactly one unsigned int n1 such that:
foo(n1) == t
We call functions like this one-to-one.

What are some countable types? Well, signed int is countable. Here's a function that will work:
int foo(unsigned int n){
  if (n % 2 == 0){
    return n / 2;
    return -(n + 1) / 2;
If you start from 0 and continue, this function will produce 0, -1, 1, -2, 2, -3, ...
This turns out to be a one-to-one function, so therefore the signed ints are countable.

It also turns out that signed int pairs are also countable. Suppose we have a type that looks like this:
struct pair {
  int a, b;
This type is also countable. The proof is fairly simple when you use a picture, see here for a nice one that is already done. It uses rational numbers, which are a subset of pairs of integers.

What about the real numbers though? This one is a bit trickier. Let's represent a real number like this:
struct real {
  int integerPart;
  bit decimal[∞];
What we're doing here for the decimal part is representing it as an infinite series of 1s and 0s. Since we're all computer geeks here, we know that a binary string is just as capable of representing a number as a decimal string, so we still are able to represent any real number using the above type.

Now we show that the reals are not countable using a contradiction. First, we assume that the reals are in fact countable. Let's simplify it a little and only work with the real numbers between 0 and 1, so we'll have a real0 type that is the same as real but with the integerPart field set to 0.
So since real0 is countable (by assumption), it means that we can create a one-to-one function like this:
real0 foo_real(unsigned int n)
Since this function is one-to-one, for every possible real0 value x there is an unsigned int n that will produce x when you call foo_real(n). So let's create an array that has all those real numbers in it:
real0 all_the_reals[∞];
for (unsigned int i = 0; i < ∞; i++){
  all_the_reals[i] = foo_real(i);
Now let's cause a problem. Consider the following code:
real0 y;
for (unsigned int i = 0; i < ∞; i++){
  y.decimal[i] = !all_the_reals[i].decimal[i];
Why is this a problem? Well, we assumed that all possible values for real0 are in all_the_reals. But for any i, we know that y != all_the_reals[i] because the bit at the ith decimal place is inverted. Therefore we know that y is not in all_the_reals, which means that all_the_reals does not contain all possible real0 values. Therefore there is a real0 value that cannot be produced by foo_real() — a contradiction since we assumed foo_real() is one-to-one. So real0 is not countable, and since real0 is a sub-type of real, real is also not countable.

What does this tell us? Well, we know that with our infinite precision computer there are an infinite number of possible unsigned ints we could have. At the same time we have shown that there is not a one-to-one mapping between the reals and the unsigned ints, so therefore the infinity of the reals is effectively "bigger" than the infinity of the unsigned ints.

Why would a programmer care? Well, this has an implication for computer science. But first, let's ask another question - if there are real0 numbers missing from all_the_reals, how many are there? Is there just one? Or even a finite number? Well, no, and here's why. Suppose there are a finite number of missing real0s. We could then create an array of them:
real0 missing_reals[N];
However we could then still create a one-to-one function:
real0 foo_real_with_missing(unsigned int n){
  if (n < N){
    return missing_reals[n];
    return foo_real(n - N);
So there must be an infinite number of missing real0s. Are the missing ones here countable? Assume that they are, and create another one-to-one function foo_real2() which gets all these missing real0s. But then we could still create a one-to-one that will grab all the reals returned by foo_real and foo_real2:
real0 foo_real_combined(unsigned int n){
  if (n % 2 == 0){
    return foo_real(n / 2);
    return foo_real2((n - 1) / 2);
So even the real0s that are missing from foo_real() are uncountable. Crap! In fact based on this we know that even the reals missing from the combination of foo_real() and foo_real2() is uncountable. If we create another one foo_real3() which grabs the real0s outside of foo_real_combined(), it is still uncountable. No matter how many functions we create, they will still not capture all the real0s. What does this have to do with programming? Suppose that our magical computer not only has infinite RAM, but infinite processors. The ith processor will compute foo_real(i) in the first clock cycle, foo_real2(i) in the second, etc. But since the real0s are uncountable, even with infinite processors we wouldn't be able to enumerate them all in finite time. That's something a programmer might care about.

No comments: