Mar 29, 2010

Testing is not a Replacement for Static Typing

I feel like I haven't pissed anybody off in a while, so it is time to write another "controversial" post. This time we'll revisit the old dynamic vs. static typing debate, mainly because I was walking around and it popped into my head (this is how many blog posts get written).

This time though, I won't pick on any specific languages. Instead let's take language S which is statically typed and language D which is dynamically typed (ignore the fact that there are actual languages with these names, just pretend those don't exist for a moment). Let's also say that these two languages are identical in all respects except for this single fact - the argument of whether or not this is possible can be saved for another day, so again I ask that you pretend that it is possible. Also assume that they are both strongly typed.

In this situation I would argue that S is a better language than D. Why? Well, because static typing gives you tests for free. The compiler will check to see if all the types match up like they are supposed to, so that you don't end up adding an integer with a network socket or something ridiculous like that. In D the check will also be done, but it will be done when the code is executed instead of at compile-time. This is normally a bad thing, so we write automated testing suites to catch this kind of bug.

All this is obvious so far (or at least it is to me). Why am I talking about it? It's because sometimes when I talk to dynamic-typing enthusiasts they tell me that the compiler-check is overrated and that if you have a good testing suite then you're just as well off. Sometimes they even say that the testing approach is even better, since you should be writing tests anyway.
The problem with this argument is that it ignores the human factor. Writing tests is optional, making the code pass a static-type check is not. Therefore writing tests requires programmers to have discipline. This is not a good thing, because many times the programmers don't have the discipline, or they get lazy, or they have a deadline to meet and the thing that gets scrapped is the test suite. Therefore you can have typing errors sneak in which would have otherwise been caught by the compiler.

So what is the problem with my argument? I'm still assuming that S and D are identical in all respects except for the nature of their typing. This is a bit of an idealized scenario (I blame it on the fact that I'm back in school, where "real world issues" sorta fade into the background and you get a nice clean thought-experiment environment where unexpected things don't happen). In real life, there aren't many languages that are equal in all respects, so we can't really compare two languages on their type systems alone. Instead you would have to have some sort of empirical analysis to measure productivity based on the typing systems of the languages, while accounting for other technical factors like expressiveness, presence of a REPL, available tools, etc. and human factors like ability, team size, etc. (I'll be posting another stats article on omitted variable bias/confounding soon which will address a major issue in statistical analysis).

Anyway I deviated a bit from the main point that I want to get across. Testing is not a replacement for a static type check because it requires programmer discipline/effort/time to implement it, and it is optional. Therefore there will be times when it will not get done where a static type check will always be done (in a statically typed language anyway). So if you want to argue that dynamic language D' is better than static language S', you'll have to argue other features of D' than the typing system (from my anecdotal experience, I believe that it is the expressiveness of the dynamic languages I've used that make them more productive than the statically-typed languages I've used, not the type system - using Scala seems to confirm this, since a lot of my Scala code seems to be about as long as my Ruby/Python code).

Mar 24, 2010

Statistics For Programmers V: Performance Analysis

This entry is part of a larger series of entries about statistics and an explanation on how to do basic statistics. You can see the first entry here. Again I will say that I am not an expert statistician, so feel free to pick apart this article and help bring it closer to correctness.

In the last post, I covered a little bit of theory on how ordinary least squares works. This is a method of fitting a curve to a set of points so that you can get estimates of how independent variables affect a dependent variable.

In this post, we'll use that technique in a programming-related example. Let's say we have a task of optimizing code. There's a certain section that is too slow, and we need to figure out why that section is slow and fix it up. Normally 99% of the time, the slow part is in our own code, or it might be in code that you can access and change (gotta love the prevalence of open-source these days!). However in our hypothetical scenario here you're working with a library that is not open-source, meaning that there's not really any way for you to look at the code to analyze it properly. So what do you do? Well you could roll up your sleeves, fire up ndisasm (or whatever disassembler you have handy) and take a peek at the assembly code. While this is possible, I probably wouldn't recommend it because you'll be spending so much time digging through everything that you'll be at it all day just trying to figure out what is going on. Then on top of that you'll have to do your performance analysis, which a lot of the time requires some probability anyway.

So, why not attempt to use statistics for it? If you are using a library then chances are the functions/methods you are using have some sort of inputs that they use (the independent variables) and you have some kind of measurable (usually) dependent variable based on those inputs - like the amount of time it takes the function to run, how much memory it uses, etc. What you could do is take various sets of possible input values, and record the dependent variable(s). Then afterward you form a hypothesis and you can test it using ordinary least squares.

What do I mean by form a hypothesis? Well, let's say that you are fairly certain that the algorithm is O(n2). What you could then do is set up your regression equation like this:
T = β0 + β1n + β3n2 + ε
After that you could just run your code over and over with different values of n and collect the corresponding execution times. You can then run the regression on the set of data to get the estimates for the coefficients.

Once you run the regression, most statistical packages will spit out a standard error for each coefficient. This standard error is the estimated standard deviation of the distribution of each coefficient. What does that mean in English? Well, you have an actual value for each coefficient called β, which unfortunately you don't know. If you take a sample of a certain size and assuming all the assumptions are satisfied, it will effectively take a random value from around β that is distributed according to a t-distribution. This distribution has a standard deviation, which again we unfortunately don't know. The standard error is effectively our guess of that standard deviation based on the sample we have.
What does the standard error do? It determines how precise our estimate of β is. If our estimate of β is 100 and our standard error is 0.1, then it's nearly a 100% chance that the actual value of β is between 99 and 101. However if the standard error is 10, then the range that the actual β could be in is much larger. You can get smaller standard errors by increasing your sample size, or by getting a wider range of values for your independent variables.
Note that when some of our assumptions are violated, it can potentially have a huge effect on the standard errors.

We can use the estimates and the standard errors to form hypotheses about the actual values of the coefficients. Suppose our estimate of β is 5, and we want to test to see if the actual value of β is 6. We would then form a hypothesis that β = 6, this is called the null hypothesis. You compute a t-statistic for this null hypothesis using the formula:
est(β) - β) / st.err
If this t-statistic is small (the threshold depends on your sample size and how confident you want to be in your results) then you cannot reject the null hypothesis - this does not mean that the null hypothesis is true, it just means that this sample is not enough evidence to say that it is false. It could be true that β = 5.5, and we could still get a sample like this (depending on the standard error of course). So if the t-statistic is small, we can't say a whole lot about our null hypothesis.
On the other hand if the t-statistic is large, it provides a good amount of evidence that the null hypothesis is wrong. So typically when you test a hypothesis, you let the null hypothesis be something that you want to "prove" wrong (I say "prove" since you can't really have 100% proof in statistics like you can in math).
Most stats packages like R or even Excel will spit out a calculated t-statistic for each coefficient. What this value represents is the t-statistic for the null hypothesis of the coefficient being equal to zero. If that t-statistic is small, we say that the coefficient is not statistically significant - that is, we can't really say that it is any different from zero.

What does all this mean for testing the code? Well it lets us get an idea of how fast the code runs. Suppose we get an estimated equation of:
T = 100 + 30n + 0.5n2
Provided that our standard errors for the coefficients are fairly small, this equation shows us several things. First, it shows that there is a fixed cost to the code. It doesn't really matter the size of the input, we still have 100 there each time you run it. It also shows a fairly small estimate on the n2 value. This is useful for comparing this algorithm to another. Suppose you come up with a fancy new algorithm that you figure out runs in O(n) time. This is theoretically better than the O(n2) algorithm, but if the equation looks like this:
T = 2000 + 400n
then the O(n2) algorithm is much better if you're only ever working with small values of n.

So this is a basic introduction on how to use statistics to test performance analysis. Unfortunately, there are still plenty of things that can go wrong. I listed a few assumptions in the last post, if those are incorrect then you can have problems such as biasedness and inefficiency (I'll talk about these more in later posts). Also if your model is misspecified (if perhaps the algorithm is actually O(nlog(n)) instead of O(n2)) then you probably won't get correct results. Fortunately there are tests for this kind of thing that you can do, which I might cover in a later post.

« Statistics For Programmers IV: Ordinary Least Squares | Statistics For Programmers VI: Omitted Variable Bias »

Mar 15, 2010

Statistics For Programmers IV: Ordinary Least Squares

This entry is part of a larger series of entries about statistics and an explanation on how to do basic statistics. You can see the first entry here. Again I will say that I am not an expert statistician, so feel free to pick apart this article and help bring it closer to correctness.

In my last post about statistics, I spoke about how to compare two different averages to see if they are statistically different or not. It was a good introduction to using statistics for hypothesis testing since it was fairly simple yet went over some of the fundamental ideas of statistics - that disturbances could result in us coming to wrong conclusions, and we must do our best to try and detect whether or not differences in results come from the disturbances or from what we believe it to be.

This time we'll address something a bit more complex. In many cases we want to analyze a causal relationship between two variables - that is, variation in one variable is causing variation in another variable.

Note that this post is a bit more theory-oriented, the next one will apply the theory to something programming-related.

In econometrics (something that I've been studying a fair bit over the last year or so), a tool used for this type of analysis is the ordinary least squares (OLS) method. This method assumes that the relationship between the variables can be explained by an equation that looks like this:
y = β0 + β1x + ε
Here, y is the dependent variable, x is the independent variable, β1 is the effect of a one unit change of x on y, β0 is the intercept term (basically what y is predicted to be when x is zero), and ε is a random error term. This example only uses a single independent variable, but you can have many more. You can also have functions of independent variables in there, so you could put x2 if you like. For OLS, the restriction is that the equation is linear in the β terms (called the parameters) - that is, OLS is a linear estimator.

Once we have formulated our model, we can use data to estimate what the β coefficients are. There are plenty of statistical packages out there that will do OLS on a dataset, R is one that is open-source and available for Ubuntu. It's not as user-friendly as other ones that I've used, but it is easier to get your hands on it.

This paragraph talks about how the estimates are computed, feel free to skip it if you don't care. Otherwise if you want to be hardcore and calculate out the coefficients yourself, the math formula is:
est(β) = (X'X)-1X'y
Here est(β) is the estimator of β, which is a column vector of the parameters in your equation (in the example above we had 2 parameters, so it would be a 2x1 vector). The X is a matrix of your independent variables, for the intercept term you just set one of the columns of your matrix to all 1s. Each independent variable occupies one column of your matrix, so the matrix is nxk for n observations and k independent variables (including an intercept term if you have one). X' is the transpose of X.

So now you know how to get take a bunch of data and infer a causal relationship between them, right? Unfortunately it is not entirely that simple. OLS comes with a whole pile of assumptions in order for you to properly analyze the results that come out:
  • The expected value of ε conditional on X is zero. That means that if you hold your independent variables fixed and just keep taking observations, the average error will be zero. It is common for this assumption to be violated, and when it is you usually end up with biased results. It is important to understand how it can be violated and what to do when it is. I'll talk about this more in a later post.
  • Random sampling - the different observations of a specific variable are not correlated with one another. This can be violated for dynamic systems where the observations are taken over time.
  • No perfect multicollinearity - this basically means that no independent variable is a perfect linear combination of other independent variables (for example if x5 = 2x2 - 3.14x7 you would have a perfect linear combination). You're pretty hosed if this is the case, and you'll probably need to end up removing a variable.
  • Homoskedasticity - given any value of X, the spread of your error term is constant. When this is violated you can still use OLS, it's just there will be better estimators.
  • No autocorrelation - none of the error terms are correlated with the other error terms. With autocorrelation you can still use OLS, but like heteroskedasticity there will be better estimators.
  • Normal errors - your errors must be distributed normally with a constant variance. Note that if this is the case then you will have homooskedasticity and no autocorrelation. If this assumption is violated, then you cannot use the hypothesis tests in the same way.

Assuming all the assumptions are satisfied, we can do hypothesis testing on our estimates of the parameters. Fortunately the hypothesis testing here is even easier, since statistical packages usually spit out the p-value for each parameter. What that p-value measures is the probability that if the actual value of the coefficient were zero, then we would have p-value probability of getting the sample that we did. So if the p-value is very small, you can guess that the actual value for the parameter is not zero.

Anyway like I said before, this post was a bit dry and theoretical. I'll get to something more practical next time.

« Statistics For Programmers III: Differences of Means | Statistics For Programmers V: Performance Analysis »

Mar 10, 2010

Why Common Lisp Bothers Me

I've been thinking about a newer project recently (yes, it's that time again) and I was thinking it might be cool to try to do the project using Common Lisp, since I don't know the language all that well and I'd like to try and get to know it better. I've used it a bit in the interpreter, and I used Scheme a bit when I was in university for the principles of programming languages and AI classes, and I find that while the syntax is weird at first, the languages have that neat homoiconicity property that seems to boggle the mind the more I think about it.

What seems to be bothering me about these is not the languages themselves, but how they interact with the real world. The problem started when I wanted to create an executable. You know, those things that you can give to other people that they can actually use to run your program. It took me a while to find anything, and eventually I had to run over to #lisp to find out. Their IRC bot pointed me to a page (apparently lots of newbies to Lisp ask this same question) which told me that Lisp doesn't really use executables. Instead it uses an image (like Smalltalk) which you can dump to an "executable". Except that it isn't really a compiled executable, but a saved state of your REPL session - kinda like if you were working in IRB and decided to hibernate it and when you ran your "program" it would just fire you back into the IRB session with the option of executing a function on its way in.

This brings up another small problem that I had a while back when I was trying to do my code mutation stuff, however a problem I had back then was with finding out how to do things using the standard libraries. I guess I've been spoiled by newer languages like Ruby and Python where the standard libraries seem to come with a large amount of basic functions for string/array (well they were lists, it was Lisp after all) manipulation and it is really easy to figure out how to do that kind of stuff, but when I tried to do some of the same stuff using CL I think I spent more time hunting down docs than I did doing actual coding (in hindsight I probably should have just wrote the functions myself, probably would have saved time that way).

So while I'm sure Lisp is a great language to use for academic stuff or maybe even for a server-side app, I don't think I want to use it for a project that I might want to give to people someday. It seems like the language suffers horribly from neglect, and the people who learn it love it so much that they seem to ignore certain practical matters like this.

Anyway that's the end of my rant. Flame away.

UPDATE: A few Lisp people commented on the post on Hacker News, saying my complaints sum up to things like: "Because I'm used to something else." I'm sorry, but those people have completely missed the point of my argument. It has nothing to do with what I'm used to, it has something to do with what anybody who is not a geek is used to. I want to distribute my application, it either must be executable, or work in runtime platform. The only runtime platforms that are feasible for large scale deployment are either Java-based or browser-based. Otherwise your app is going nowhere, as people won't want to install a whole bunch of crap just to get your app working. That's the point of the argument, it has nothing to do with what I'm used to.
Unfortunately I've probably just wasted the last 5 minutes of my time, because programmers tend to completely blank on reason and logic when it comes to their favourite programming language.

Mar 8, 2010

If University were like Real Life

If things had been the same in my university computer science program as software development is in real life, I think the following would be true:
  • Assignment Outlines - that paper/PDF you get for your assignment would be horribly vague. The only way to get a clear idea of what the professor wants is to go and ask. Unfortunately when you do that, the answers are also fairly vague, so in the end you'll have to try and guess what they want. (In retrospect, this one actually does happen in university sometimes. I'm leaving it here since in real life this is the norm, not the exception.)
  • Assignment Requirements - at some point before the assignment is due, the professor will change what the assignment is about completely. You can either adapt what you have, or start it again from the beginning (although most students procrastinate so it's likely that they won't have anything yet).
  • Assignment Review - after showing the professor your assignment, they'll change their mind on what they want. Unlike the previous point, this will only occur after you've shown them the assignment, which heavily penalizes procrastinators.
  • Grading - there won't really be any grades in the same sense of the word. Some assignments will have some crazy weighting system, so if you get 90% if the answers right you'll still fail, but your friend who got 20% of them gets a good grade since s/he knew exactly which ones to do. Unfortunately you probably won't figure out this weighting scheme.
    Another possibility is that the grading system is either 0 or 100. The professor doesn't care if you've got the features 90% built, they want to see everything that they asked for.
    Other assignments won't have a grade at all, instead the professor will just accept what you give. 6 months down the road if the assignment was poorly done, you'll be kicked out of university; if it was well done nothing will happen. (An explanation of this one: many times in the real world you're asked to make a judgment call, and the client will just accept your conclusion. After all, you're supposed to be the expert. However after a certain amount of time, evidence might show whether that was a poor decision or not.)
  • Competition - your friend does a good job on the assignment and hands it in a week ahead of you. You fail the assignment, regardless of how good it is.
  • Other people's work - instead of you starting the assignment from scratch, you'll get the version from last year's class and be told to change it. (it's possible that some professors already do this, which is a good thing!)
  • New technologies - you'll start new courses that need you to use a specific programming language or technology that you may or may not have done before. If you don't learn this technology within a few weeks, you fail the course. (there are a few courses like this, especially if you go to a school that teaches Java in first year and then drops C on you at a higher level - this is a good thing!)
There's probably many more things that I could add to this list, if you think of something feel free to add it in the comments!

A little side note: I am not suggesting that computer science programs change in order to create better programmers. A computer science program is not about teaching people how to be better programmers, it is about teaching people computer science. While I do think computer science is a good discipline to study if you want to be a programmer, the actual becoming a good programmer part is your responsibility.

Mar 5, 2010

Hello world in 63 bytes

I recently discovered this guide here to making teeny programs in Linux. I thought it was pretty neat, so I've decided to post their Hello world application here:
BITS 32
org 0x05430000

; ELF header stuff - this stuff is information about the executable that
; Linux uses to see how to execute it. It's stuff like what platform it's
; for, what type of executable it is (object code, executable, shared object),
; how long the program is, where the entry point is, etc.
ehdr:
; db means put a byte, dw means put a word (2 bytes)
; and dd means put a double-word (4 bytes)
db 0x7F, "ELF"
dd 1
dd 0
dd $$ ; I'm assuming that $$ means the start of the program
dw 2
dw 3
dd _start
dw _start - $$

; the program
_start:
; send the string to stdout
; system call for write is 4, file descriptor for stdout is 1
; so set eax = 4, ebx = 1, and ecx = string pointer, edx = string length
inc ebx
add eax, dword 4
mov ecx, msg
mov dl, 14
int 0x80
; quit
and eax, 0x10020
xchg eax, ebx
int 0x80

; Note: The original version doesn't put an ! at the end,
; which makes my version one byte longer. Oh well.
msg: db 'Hello, world!', 10
You can assemble this using NASM:
sudo apt-get install nasm
nasm -f bin -o a.out helloworld.asm
chmod +x a.out
It's neat because by looking at how this stuff works, you begin to understand how the operating system manages loading programs under the hood. If you're interested in this type of thing, check it out!

I'm not sure how well it would work in Mac or Windows, since a lot of the short-ness of the program comes from exploiting the fact that Linux ignores a lot of the ELF header info, if the ELF header info is too short it will pad the rest with zeroes, and finally Linux clears all registers to zero when loading a program. I'm not sure if the Windows or BSD kernels would do the same, I guess the only way to find out is to try it!

Mar 4, 2010

On Challenges

Recently I've been working at redoing a challenge that I failed in the past. It is not really my choice since it is a class required for my degree, but I find I am enjoying being forced into the opportunity and hope that in the future I will force myself to re-attempt past failures in hopes that I can do them better.

The failure was a class called "Real Analysis", and while I didn't actually fail the class, I dropped it halfway through because it was an elective that was kicking my ass. It is a class where you analyze what the real numbers actually are, and how they can be used to build up the fundamental ideas of calculus. After getting my ass kicked on trying to do proofs about limits and continuity, I decided it might be better if I spent my time working on computer science classes instead of this class that, while interesting, had no real impact on what I wanted to do in life (which at the time was graduate and be a programmer).

This was a big regret of mine. At first I was happy to have free time again, but after a while I felt somewhat bad that there was something in a field that I was supposedly good at, yet couldn't do. So here and there I would re-tackle some of the problems in the textbook, with varying degrees of success. Some of them were easy this time around, others were still just as difficult. However now that I'm in a class where I basically have to relearn all this stuff, it isn't as bad as the first time around, and I find I'm actually enjoying it.

So what am I getting at here? Basically what happened to me was I was encountering new challenges. Sure, I've met challenges before in my academic and professional careers, but a lot of the time these challenges were programming or computer related challenges. This is stuff that I know reasonably well, and so the "challenges" aren't really all that challenging when you think about it, since I'm already thinking in the mindset required to solve these problems - although a good programming problem is one where thinking like a programmer isn't really the way to solve it. So if you're one of those programmers who loves to find challenges, why not start looking for challenges beyond your area(s) of expertise? One great thing that could happen is you begin to look at problems from a different perspective, which you can take back to your programming problems and maybe create something great. One thing that I've been thinking about now that I've been writing recipes in Ruby is real-time languages - basically ones where timing is important, even timing that makes your program run slower. A neat idea would be to add a "when" keyword that works like "if" but instead of just skipping when the condition is false, it hangs around until the condition is true. You could do this in Ruby (although you can't use "when", since it is already a Ruby keyword) by just making a function that takes a string of code and a block, evalling the code and sleeping until the evalled code evaluates to true.

Then again, if you want programming problems you can always check out Rosetta Code.

Mar 3, 2010

FOOP Presentation

I'll be giving a presentation next Friday, March 12, at Bishop's University (this might be come a semester-ly tradition) on something I'm calling FOOP, which stands for functional object-oriented programming. I'm hoping to talk about what it is and the advantages it has over both functional programming and object-oriented programming. Finally I'll be talking about how Scala accomplishes the combination of the two, and some of the neat features of Scala that I've discovered. I'm not really a Scala expert so I won't be going too much in detail, but hopefully it will be enough to get some people interested in looking into it a bit more.

If you are in the Sherbrooke, Quebec area, feel free to come and check it out! I'll be posting the slides on here afterward for those who can't make it.

Mar 2, 2010

Javascript Profiling With Firebug

I realized something today when trying to figure out how to profile my Javascript game jarm. If you go into Firebug and go to the Console tab, there is a button there that says "Profile". For some reason I've never actually noticed it was there! It works just like a regular profiler, click "Profile" and let it do its thing. When you're done click "Profile" again to stop and it will give you a report.