Sunday, March 06, 2011

The new premature optimization: lines of code

In the last 10 years, the number of widely available programming languages has exploded and, thanks to client-agnostic technology such as the web browser, HTML and HTTP, the ability of engineers to choose their language has increased. As such, you can't find a thread about languages where lines of code aren't compared. Lines of code are compared in both number of lines required to do something and elegance/style. Because, after all, when we choose a language to use, we then have to stare at it all day. So it's worth seeing how good or bad it looks.

Take, for example, this tidbit on the front page of the Haskell website:

3.1 Quicksort in Haskell

qsort [] = [] qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++ qsort (filter (>= x) xs)

3.2 Quicksort in C

True quicksort in C sorts in-place:

// To sort array a[] of size n: qsort(a,0,n-1) void qsort(int a[], int lo, int hi) { int h, l, p, t; if (lo < l =" lo;" h =" hi;" p =" a[hi];" l =" l+1;"> l) && (a[h] >= p)) h = h-1; if (l < t =" a[l];">
My copy of their HTML is broken here, but the gist is this: Haskell allows you to do a quicksort in just 2 lines of code, and C is 28 lines. Conclusion: Haskell must be that much more fun and expressive to work in.

You'll find examples like this for every language that's out there except for C, C++ and Java, because those are considered the old and busted languages that require too much cumbersome code. The new hotness.... Ruby, Python, Erlang, Haskell, Scala... they'll all have examples like this.

Meanwhile, while we're comparing lines of code, performance is going to crap. I saw someone make the point today that Ruby 1.8.x provides just 2% .... TWO PERCENT .... of the performance of C. And it's true -- backed up by the language shootout.

Furthermore, many dynamic languages allow you to write elegant code like this, but it's also unchecked. It requires far more unit testing in a dynamic languages to get the same assurances that one can get from using a statically typed or type-inferred language. So, long term, you are putting the onus on yourself to do checking that would otherwise be done by a robust compiler. (Haskell doesn't have this problem, of course).

Could it be that the new form of "premature optimization" is in the form of lines of code? In the case of Ruby or Python, we're prematurely trading off compile time checking and performance to have what we consider more elegant code? This is not new, it's been going on for years -- a C programmer will tell you that C++ was similar, and a C++ programmer will tell you the same of Java.

And I know someone is going to tell me that if they hadn't written in Python or Ruby, they never would have shipped it quick enough. I don't know how this claim can be proven valid or invalid. There are lots of websites out there written in Java that are successful and timely.

Also, when challenged, sometimes the rebuttal to this concern is that "you can always extend your [Ruby/Python/Perl] using C." That's true, you can. But if the requirements of your project have some aspect of performance or using APIs or libraries that aren't supported by [some language here], then aren't we falling into the pit of premature optimization, just in terms of what we perceive as a better way to work? Why are we disregarding requirements just to fulfill the dream of reducing lines of code?

I'm not saying these languages are bad. I'm saying that requirements or realities of the project/world are being disregarded to use them. So let's call it what it is: premature optimization.