Apr 13, 2009

Hungarian Notation in Ruby?

Way back when I was first learning to program, I was quickly introduced to the Win32 C API, which uses a lot of Hungarian notation on the variable names. This got really annoying really quickly, especially back then when there was a difference between a "long pointer" and a "short pointer", you'd get stuff like "g_lpasz_..." which means "long pointer to an array of zero-terminated strings, which happens to be a global variable". It can get nastier.
Eventually I dropped this, because in C/C++ you explicitly declare the type of a variable anyway, and the IDE that I used at the time (Visual Studio) would display the type anyway. It was both redundant and slightly verbose, so it didn't really come as a surprise that when I came to Java in university and saw that it did not use this notation that maybe it was worth ditching (it's one of the few things that Java ditched from C++ that I actually agreed with). So now I don't use it when I'm programming C++.

Most of the time these days, I'm not programming in C++. While I still like the language - probably because I haven't found a language that does what it does well better, namely execution speed with OOP/generics - most of the projects I seem to do are better suited to Ruby - there are probably better languages to use for some of the things I do (OORegress would probably be better off in Scala since it does number crunching) but whatever.

As my apps and web apps get more complex however, I'm thinking some sort of very basic type system would be in order. Since in Ruby you can never be really certain what type a variable is (I've been bitten in the ass in Rails by Hash vs. HashWithIndifferentAccess), having some sort of hint to other programmers what type the variable should be would be quite helpful. Just a thought.

Like most things, it is best to use it sparingly. If your Hungarian notation portion has more than say 5 characters in it, it's getting a little long. Plus since in Ruby a parameter may have more than one expected type - in Rails, ActiveRecord::Base#find can take an integer or a symbol for the first parameter - and lists can be heterogeneous there are situations where the system may not be comprehensive enough for what you need. However for 90% of the times, I think it would be an improvement.


Sung said...

This is terrible advice IMHO. A few points:

> "g_lpasz_..." which means "long pointer to an array of zero-terminated strings, which happens to be a global variable"

From what I've read, Hungarian notation was conceived as a way to note the intent of a variable, and not its type. The mass misinterpretation of this practice led to tremendous amounts of pain.

And all for naught: the compiler most certainly knows what type a variable is. If one turned on the compiler's warnings feature, it could have told you when a poor variable was about to be abused. You could even set it to stop the build if it found an implicit cast.

An implicit cast! Therein lies the problem. C and C++ programmers of the nineties were obsessed about variable types because C and C++ are weakly typed languages. They will make all sorts of horrible assumptions all in the name of speed and simple compiler implementation. Declaring your variables to a *_t at compile time is a speed hack, not a protective measure.

> As my apps and web apps get more complex however, I'm thinking some sort of very basic type system would be in order. Since in Ruby you can never be really certain what type a variable is ...

Ruby has much more than a "basic type system". Since everything is an object, including numbers (which are primitives in most languages), you can always find out what type a variable is:

str = "foo"
str.class # => String
str.kind_of? String # => true

Furthermore, Ruby is a strongly typed language: 2 + "foo" is going to result in a TypeError exception, not a mangled string like it does in C.

Ruby's main problems have more to do with object mutability; e.g. the global interpreter lock and unsafe threading issues. Ruby is a victim of its expressiveness, not its typing.

The application programmers of the nineties used corrupted Hungarian notation as a way to manage the shortcomings of their systems. It made a certain kind of sense to the programmer that was intimately manipulating the stack, mallocing this and freeing that. But it was an attempt to battle complexity with complexity, simulating register machines in order to think at a higher level.

It was a total failure. Please don't promote this to the Ruby community!

(Joel Spolsky on System Hungarian vs Hungarian Apps)

Rob Britton said...

Re-reading that comment, I guess it gives the impression that I meant Ruby has a basic type system. That is far from the case and I am well aware that Ruby has a strongly typed system. My point was more that Hungarian may be more useful in Ruby than in C/C++ because Ruby has a dynamic type system. A lot of the time when programming in Ruby a variable will only ever have a single type and it is nice to put that little bit of documentation there to let other programmers know what the the type of the variable should be.

Anyway this was more of a passing thought than an actual recommendation, and I don't think it should be used to the same extent that it was in C/C++ (g_lpasz is overkill). For example, I was thinking more of a little note like aMyVar or sMyVar to denote strings and arrays. I've had problems once in a while where a method is expecting an array but gets a string, but since the two classes have such a similar set of methods you end up with strange/unexpected behaviour as opposed to errors. I don't think that Hungarian notation will in any way fix this, but I don't think it could hurt if used in moderation - although that comment there is spelling doom, when do programmer ever do something in moderation?