Criminal Overengineering

telemachos · on June 23, 2010

Argh.

Apparently there are two very popular types of article in the software blog world:

Type 1: YAGNI (like this example): Do less now. Refactor later as needed. It won't be needed, most likely. Chill out. (All driven by the question, "Dude, wtf? 100 lines of boilerplate for a 5 line case statement? Snap out of it.")

Type 2: Architect astronautics: Do more now. Build for the next version. You will need more then, so why not prepare now? Decouple that code. Use more patterns. Hoist that jib. (All driven by the question, "What will your code/software/app do if...?")

I read type 1, and it (often) sounds convincing. I read type 2, and it (often) sounds convincing. I get a fucking headache from the cognitive dissonance. I make more coffee and get back to work, no wiser than before.

edw519 · on June 23, 2010

Great comment. 2 thoughts:

1. You can read anything and it (often) sounds convincing. Quality writing that is incorrect often trumps poor writing that is correct. That's way we must always be vigilant readers, especially on the internet.

2. "ArchitectingForTheFuture" does not necessarily mean "MoreLinesOfCode". I'd like to read more articles of Type 3: How excellent design and proper use of tools gives you the best of both Type 1 and Type2.

[EDIT 1: stcredzero, in reference to cousin comment, my personal metric is n=2. I never want the same line of code more than once. I'm sure there are good arguments for other values of n, but this has always seemed to work well for me.]

[EDIT 2: I try to make a point to never say "Not Hacker News" or complain about content. Conversely I should be quick to claim "Not Not Hacker News". This is a great thread! About stuff near and dear to this programmer's heart. Keep 'em coming.]

stcredzero · on June 23, 2010

As I say in a cousin comment: an easy way to get the best of both worlds is to wait until you have an apparent problem. The 4th time you start writing that same switch statement, maybe you can tell you're going to be doing this a lot more and it's time to bring out Strategy. Doing it sooner is too likely to be premature.

The advantage of this approach: you never have to prognosticate. Hindsight is 20/20, so use it!

EDIT: I used to use n=2 as my threshold, but I found it much better to have a slightly higher n in Smalltalk. I am spoiled, though because I have such lightweight but powerful (syntax-aware) tools for searching such patterns.

a-priori · on June 23, 2010

As always, the best solution is somewhere in the middle of the two extremes. Everything has a cost. YAGNI has a cost of rework later if it turns out you do need it after all. Architecting For The Future has extra up-front and maintenance costs, which is wasteful if it turns out the future wasn't as you saw it.

And so, it becomes a cost-benefit analysis. C(YAGNI) = C(develop a simpler version) + C(rework) x p; C(AFTF) = C(develop) + C(maintain). Where p is the anticipated probability that you will actually need that feature (and realizing that you're probably going to guess higher than the actual probability).

anamax · on June 23, 2010

There's a C(rework) term in C(AFTF) as well because the "in advance" work never actually "just works" for the future that shows up.

jimbokun · on June 23, 2010

"Everything has a cost. YAGNI has a cost of rework later if it turns out you do need it after all. Architecting For The Future has extra up-front and maintenance costs, which is wasteful if it turns out the future wasn't as you saw it."

Isn't that why we write good unit tests and refactor? It gives you close to best of both worlds. Writing unit tests is helpful anyways, and gives you peace of mind when you introduce patterns later on when you actually need them.

lukev · on June 23, 2010

I feel your pain - I struggle with the same thing. But I think it's possible, to some extent, to have the best of both worlds.

An ideal of clean, logical design actually satisfies both #1 and #2. You don't write any more than you need to, but you architect it sensibly so if you need to modify it in the future, there's a natural way to do it.

A case study: you are responsible for maintaining a moderately sized mailing list.

Type 1: Maintains a comma+newline separated list of name/email pairs in a text file, parse it with a 3 line perl script that sends an email for each regex match. Done in 15 minutes. YAGN anything else.

Type 2: Builds a fully relational model in a database with seperate tables for names and addresses, backed by Hibernate with a complete class hierarchy including AbstractRecords (in case you want to store records other than name/email pairs), RecordFactories (In case we need to generate lists of records from another source), AbstractRecordDAO (in case we need to use a different Database or ORM framework), EmailFactories, AbstractMailerImpl, etc, etc, etc, planning in the architecture for anything anyone might ever want to do with a mailing list.

Both of these are wrong, IMO.

The correct solution is to write one database table, or one cleanly formatted file, with a program with perhaps two classes that abstract apart the data loading and the mailing tasks. One class loads the data into a simple Map structure, and the other that handles iterating the map and doing the mailing.

It only takes a bit longer to write than #1, and is infinitely simpler than #2. It doesn't anticipate every future need, but when one comes, there's a logical point to start adding the functionality. If I have to do something new with the list, I don't have to completely rewrite my program (like #1 does), I just write a new class that uses the existing data structure. Just as extensible as #2, with a tenth of the work. And easy to see what's going on.

bsaunder · on June 23, 2010

hmm...

If I have to do something new with the list, I don't have to completely rewrite my program (like #1 does)

You said #1 only took 15 minutes. Hardly seems like a big cost. If you never have to rewrite, #1's the right answer. Or if the rewrite includes functionality never even anticipated in your "correct solution". If it's that small, starting over with a clean slate may be faster than fitting in new functionality.

lukev · on June 23, 2010

This is a small example, so yeah, you're right, when the absolute times are so small.

But say instead that it's a larger problem, and that #1 takes one day to build, #2 takes a week, and the "preferred" solution (call it #3) takes two days.

Then along comes your change request. It will take most of a day to rewrite #1 to do something never anticipated, but it can be slotted into #2 or #3 in about an hour.

Now you've broke even between #1 and #3, and that's only the first change request. Any real system is going to get many more.

These numbers are pulled out of thin air, of course, but they're pretty accurate according to my development experience.

radu_floricica · on June 23, 2010

Sometimes the world just is as simple as it seems. Type 1 is right, type 2 isn't.

After 15 years of writing code I can only vouch for two metrics of code quality: clarity and length. For a long time I too thought that clarity means using "thisIsAStudentBirthDate" variable names - but it turns out there are very very few instances where the shortest code isn't also the most obvious at first glance.

I think the turning point was when I decided rewrites aren't a bad thing. I started seeing the code as alive - unchanged code isn't good, is dead. The myth of perfect code is just an illusion. Writing for the next version sounds very sensible - until you get enough years under your belt to realize that either there won't be a next version or it won't be anything like you expect.

j_baker · on June 23, 2010

Type 1 is right for you. A good type 2 knows they can't anticipate the future and prepares for the unknown. If overdone, this is bad, but the same can be said for type 1.

And rewrites do become difficult as a system grows if you don't think them through. It's easy to rewrite one component with no dependencies, but it's not so easy to rewrite something that lots of different pieces of code rely on.

radu_floricica · on June 23, 2010

A problem with a verbose approach is that when ver 2.0 comes you can't really know/remember which part of the code is actually used and which was just prepared for future development - so you end up maintaining and coding around pieces of code that will never be used.

Like I said, what eventually bites you is that you never ever ever know exactly what version 2.0 will require. So in the end you still have to adapt code, and adapting clean short code is easier.

j_baker · on June 23, 2010

If you have to explicitly remember which code was prepared for future development, you're doing something wrong. The point of having loosely coupled code is that it should be easily modifiable under a wide variety of circumstances.

jacabado · on June 23, 2010

I upvoted you for your braveness but I strongly disagree with you. (For me) Type 2 is wrong for everybody because: + Been there, done that, ended sadly; + We're all different but our limitations are pretty much the same, respect your colleagues! + Every successful achieving programmer I've worked with are persons with particularly good sense; + Knowing how programming related skills like abstraction and logic are highly related to cleverness you can be easily tricked into seeing programming as a cleverness demonstration or competition. That should be done in college or in controlled environments(no deadlines, no changes in team).

cianestro · on June 24, 2010

Definitely agree this juggling rests in experienced hands. A vet knows what balls can be dropped and which ones cannot. My thoughts on this are: 1) As a rule of thumb, the longer the project the more structured it should be--which should be a no brainer. 2) Having a clean dispatch module allows you to sweep the mess under loosely coupled rugs. 3) Programmers are code excavators first, curators second, developers third.

stcredzero · on June 23, 2010

They're just two sides of the same coin. YAGNI is the starting point. Be as minimal as possible. Code is just like inventory in a supply chain. Do things as cheaply as possible for as long as you can get away with it.

I think you're misinterpreting Type 2. You don't bring out Type 2 to prevent future problems. You bring it out to solve problems.

I have a rule: don't apply a pattern or other advanced technique until doing so will eliminate or reduce code in more than 3 places. If the same code starts popping up in more than 3 places, it may show up in a whole lot more, and it's going to start being a pain to find all of the occurrences. On the other hand, just finding 2 or 3 and correcting them is not really that hard, so leaving things until n=4 is not too bad.

Going back to the article, never apply a pattern in a language until doing so yields fewer lines of code than the original code. If your language requires N > 10 for this to be true for most patterns, then switch languages! (Someone should turn this into a metric!)

bmm6o · on June 23, 2010

It's worse than that. The whole article is attacking a straw man, which so many software blog articles do. I'm not sure I would take the example in a Wikipedia article to be the pinnacle of software engineering. Sure, it would be strange to see the sample code appear as-is in a code base. I would agree that a strategy for adding and subtracting might be overkill. But it's an example! It's purpose is to convey an idea. If you're going to critique it, you really have to do so on those grounds.

ecaradec · on June 24, 2010

Type 1 design is not taught anywhere, people don't even know this is an option. The only design strategy that is taught is architecture astronautic.

I had an argument with a teacher that thought that it would be useful to abstract things from a pop/imap email fetching service, in case other protocols show up ? I'd argue that was improbable and that would complexity the design and that possible other protocols might be so different that they wouldn't fit the abstraction. He dismiss it as industry specific way of doing (but I lacked examples a little bit, so I wasn't completely convincing ).

We have to be quite vocal because we cry against the mountain.

zackham · on June 23, 2010

My 2 cents: Go for (1), keep several potential future use cases in mind, and make damn sure you don't architect yourself out of those being natural directions for the code to go. Often this leads you to stub out an interface that looks like the beginning of a nice architecture, but not actually flesh out the code beyond what is necessary. This strategy generally requires more thinking than coding, but in my experience that is usually a good thing.

mechanical_fish · on June 23, 2010

My conclusion, as I watch my own code wander back and forth in the space between the two poles, is: One needs a lot of practice!

jokull · on June 23, 2010

"practice" or "experience" are the terms missing from this discussion! An experienced person can iterate less and has a better design right off the bat.

DannoHung · on June 23, 2010

Yeah, this is why some people get paid money to make decisions about software design. Finding a balance between what you ain't gonna need and what you should be prepared for involves experience, a knowledge of the problem space, and just plain smarts.

j_baker · on June 23, 2010

The reason is that neither view is correct or incorrect. They're just different. Reality is in the happy medium: preparing for the future without overengineering.

Tamerlin · on June 23, 2010

Experience WILL tell you that the best answer is ALWAYS #1. Don't make it complicated unless you absolutely have to. 99% of the time, you don't, and the architects are full of it, but if they ever admit that, then they lose their job security.

When in doubt, remember what Knuth said: "Premature optimization is the root of all evil."

Silhouette · on June 23, 2010

I think this one line sums up my views about a lot of trendy software development practices:

> If you're about to take a hundred lines to write what you could in ten, stop and ask yourself this: what the fuck?

TDD advocacy is my pet hate for this today. I read an article the other day that managed to turn Hello, world into about half a dozen source files and dozens of lines of code, all pulled together with a makefile several times longer than the whole effort should have been, and seemed to be claiming some sort of profound revelation from doing this!

The quote is also a succint analysis of why C++ and Java are becoming less useful every year, relative to the field of programming languages as a whole.

And of course, as in the example in the article, it's good for bashing people who think tools like design patterns must be used everywhere, regardless of whether they actually help to keep a design clean and maintainable.

noss · on June 23, 2010

Are you sure you didn't see the forest for the trees? As far as I know, there is not a huge industry demand out there for Hello World applications, but there is a demand for simple examples when explaining concepts.

ajdecon · on June 23, 2010

Out of curiosity, can you post a link to the "hello, world" article?

Silhouette · on June 23, 2010

Your wish is my command. :-)

http://blog.objectmentor.com/articles/2010/05/20/hello-world...

jimbokun · on June 23, 2010

I kept looking for the give away that this was some kind of sick parody. But no, it seems totally sincere.

Silhouette · on June 24, 2010

I'm afraid several of the guys at ObjectMentor lost the plot some time ago. It's a shame, because back in the day, some of them wrote about ideas that were thought provoking even if you didn't necessarily agree with all of them.

I don't often visit their blogs any more, but occasionally, I still come across their site after someone links to it. It's one of those things where you know it's bad and you should just look away, but somehow you can't help yourself. ;-)

ajdecon · on June 26, 2010

Really late, but thanks!

gfunk911 · on June 23, 2010

Part of the problem is that a language like Java:

1. Is verbose by its very nature 2. Has language characteristics that push the developer in the direction of using design patterns at every opportunity.

So what might be a 1000 line program in Ruby is a 3000 line program in Java (translated as directly as possible), but then the deficiencies of the language turns simple idioms in a powerful language into complex design patterns in Java. See: closures, lack of duck typing/deficient type system, etc.

Destruct1 · on June 23, 2010

I think the Java culture plays a big part in the whole mess.

It would be easily possible in other languages to write 4 implementations of a basic list or public setter/getter for every member or buffered stream interfaces for all kinds of I/O.

Its not done because long solutions are frowned upon.

billswift · on June 23, 2010

This is a large part of a post I made April of last year:

KISS: Keep It Simple and Succinct

Succinct means brief and concise, to the point. A succinct argument is one that more directly addresses the point under discussion.

The "traditional" meaning of KISS completely misses the point. The biggest problem KISS addresses is over-complication of plans - and it is not a problem of stupidity. Those most prone to over-complicate are the more intelligent, especially the highly intelligent and highly educated, but lacking in practical experience. Experience, especially wide experience, is the best prophylaxis for over-elaborate plans.

http://williambswift.blogspot.com/2009/04/kiss-keep-it-simpl...

herrherr · on June 23, 2010

In this context "The Evolution of a Python programmer": http://gist.github.com/289467

gadtfly · on June 23, 2010

I don't think I've ever read a rant about code smells which couldn't be trivially fixed with basic functional programming principles.

stcredzero · on June 23, 2010

It also depends on the language. I bought a book about functional programming in Java once. I found it painful.

tkahn6 · on June 23, 2010

You should seriously consider checking out Clojure. It's awesome.

clark · on June 23, 2010

I think frameworks have to be engineered carefully; the use of the strategy pattern in the example is a typical and is very soundly used in a framework. I think the problem is that developers write far too many frameworks than they should. Interestingly enough, it's code that isn't extensible that ends up being a framework -- mostly because the authors have time to focus on documentation, support, user interface and the like.

efsavage · on June 23, 2010

TLDR: Don't solve for problems that don't exist.

stcredzero · on June 23, 2010

I think a better way to put it is:

    1 - only solve problems that already exist
    2 - only accept solutions less painful than the problem

Really, if programmers could stick to this, this would be all the methodology we'd need!

sapphirecat · on June 23, 2010

> 2 - only accept solutions less painful than the problem

That sounds good, but it doesn't work unless you have the experience to tell how painful the solutions you thought of actually are--which you don't have if you haven't been hurt by some overengineering.

stcredzero · on June 23, 2010

Lacking that, lines of code isn't a bad place to start. Also, if you have coworkers, look at their facial expressions when you explain the thing to them.

Sandman · on June 24, 2010

I agree with the article... up to a point. Write your programs to be as simple as possible - but not simpler.

VBprogrammer · on June 23, 2010

I completely disagree with the example given in this blog. I don't have time to give a full explaination but I believe the google clean code talks gives a far better arguement than I ever could.

On the general principle I agree that overengineering is to be avoided, but I actually think the example shows a clear disregard for Object Orientated principles.

http://www.youtube.com/watch?v=4F72VULWFvc

arethuza · on June 23, 2010

So what's wrong with using a switch statement if all you have are 3 operations?

Even if more operations had to be added I'd probably let it grow to the point where the method the switch is in was getting a bit unwieldy then look at refactoring it using a pattern if I really thought it would be worth it.

jemfinch · on June 23, 2010

> So what's wrong with using a switch statement if all you have are 3 operations?

Because you won't always have only three operations. What about division? Exponentiation? Square root? Factorial? Arbitrary user-defined functions? What if you didn't anticipate an operation one of your clients needs? If you use standard OO principles, your client can rectify that problem; if you use a switch statement, they can't.

> Even if more operations had to be added I'd probably let it grow to the point where the method the switch is in was getting a bit unwieldy then look at refactoring it using a pattern if I really thought it would be worth it.

Why not just do it right from the start? It's extremely simple, it's a pattern every OO programmer is familiar with, it's more computationally efficient and has other advantages as well.

It's also not nearly as complicated as the linked Strategy code. The appropriate analogue to the author's switch statement (in Python, since I'm not a Java programmer):

    class Op(object):
      def eval(self, a, b):
        raise NotImplementedError

    class Add(Op):
      def eval(self, a, b):
        return a + b

    class Subtract(Op):
      def eval(self, a, b):
        return a - b

    class Multiply(Op):
      def eval(self, a, b):
        return a * b

It's twelve non-blank lines, more than half of them boilerplate; translated into Java it would probably gain a few keywords and a couple lines of ending braces, but would not grow significantly. Compare this to almost that many lines for the switch version (note that the OP left out the declaration of the enum, the function boilerplate, etc.) for something less maintainable, less extensible, less idiomatic, and with lower performance.

arethuza · on June 23, 2010

I would guess that 95% of extensible systems are never extended :-)

jemfinch · on June 23, 2010

By the clients, sure. But by the original authors?

This design (it's the basic "convert a switch statement to polymorphism" refactoring) turns a ball of mud switch statement into several independently comprehensible classes. Operations can be understood, verified and tested apart from the whole evaluation apparatus.

No doubt many programmers today massively (perhaps even criminally) overengineer their software, but the fact remains, the example chosen by the author is a really bad one.

anamax · on June 23, 2010

I'd guess that more than 5% are extended (maybe 6-8%), but that more than 95% are not extended as expected.

When something is not extended as expected, either you end up with something not as good as you'd have gotten by waiting or you have to rip out some of what you put in for the future without ever using it.

jemfinch · on June 23, 2010

The problem with this term "extended" is that it's unnecessarily limiting. Code needs to be maintained, whether or not it's "extended," and switch statements are by far less maintainable than the alternative polymorphism-based implementation. They don't as readily admit separate testing, and the implementation of specific operations are not isolated from each other, etc. 100% of code is maintained and the polymorphic method is far better for that than the switch statement. A programmer who chooses the latter over the former is being shortsighted.

neutronicus · on June 24, 2010

"switch statements are by far less maintainable than the alternative polymorphism-based implementation"

You know, everyone says this, and I've never quite gotten it. Why is writing another class and implementing another virtual function so much better than adding another clause to a switch statement? At least all the cases of the switch statement are in the same place instead of scattered across a bunch of files.

Polymorphism to me just seems like a switch statement you have to think harder about. Maybe that's why I've always preferred a functional style to an object-oriented one.

Disclaimer: I program in Fortran for a living, so "SELECT-CASE Stockholm Syndrome" is definitely a possibility.

jemfinch · on June 25, 2010

Polymorphism is more maintainable than a switch statement for a few reasons:

1. Polymorphic method implementations are lexically isolated from one another. Variables can be added, removed, modified, and so on without any risk of impacting unrelated code in another branch of the switch statement.

2. Polymorphic method implementations are guaranteed to return to the correct place, assuming they terminate. Switch statements in a fall through language like C/C++/Java require an error-prone "break" statement to ensure that they return to the statement after the switch rather than the next case block.

3. The existence of a polymorphic method implementation can be enforced by the compiler, which will refuse to compile the program if a polymorphic method implementation is missing. Switch statements provide no such exhaustiveness checking.

4. Polymorphic method dispatching is extensible without access to (or recompiling of) other source code. Adding another case to a switch statement requires access to the original dispatching code, not only in one place, but in every place the relevant enum is being switched on.

5. As I mentioned in my previous post, you can test polymorphic methods independent of the switching apparatus. Most functions that switch like the example the author gave will contain other code which cannot then be separately tested; virtual method calls, on the other hand, can.

6. Polymorphic method calls guarantee constant time dispatch. No sufficiently smart compiler is necessary to convert what is naturally a linear time construct (the switch statement with fall through) into a constant time construct.

Now, to answer the objections you offered. You said, "At least all the cases of the switch statement are in the same place instead of scattered across a bunch of files", to which I would reply that in the polymorphic method case, at least all the code related to a particular case is in the same place. In the switch statement case, you're spreading dispatch machinery and data-specific code all over your program, wherever you switch on an enum. In the polymorphism case, that dispatch machinery is abstracted by the compiler (and thus not present in your code at all) and all the code related to specific types of data is centralized in that type's class. The general code remains general, having no knowledge of the specific, per-type implementation.

You also said, "Polymorphism to me just seems like a switch statement you have to think harder about" to which I reply that on the contrary, polymorphism is great in that you don't have to think about it at all. A programmer attempting to demonstrate that a switch statement is correct must delve into the switch statement and show it to be correct for each case. A programmer attempting to demonstrate that a polymorphic call is correct need only ensure that the call's abstract preconditions are satisfied, and can consider the actual implementations of that call to be black boxes that he need not look into.

In response to your disclaimer, I would say that I mean no offense, but it's very possible that Sapir-Whorf is impacting your language preferences here. You find what you do most to be easiest to understand, and what you do most is switch statements, not polymorphism. I am no doubt afflicted with the same condition, but I think as I demonstrated above, there are many objective reasons why polymorphism is superior to a switch statement in most cases.

sprout · on June 23, 2010

Yes, but how much time does it save the other 5% of the time?

If the other 5% of the time stuff takes off and becomes standard throughout the projects using your code, then it can more than make up for it.

arethuza · on June 23, 2010

http://c2.com/xp/YouArentGonnaNeedIt.html

jedbrown · on June 23, 2010

Depends whether it's in a library where extensibility is a feature.

ErrantX · on June 23, 2010

One of my lecturers wrote the following at the top of our course notes:

Contrary to popular opinion using OOP does NOT mean "thou shalt make every last thing an object"

stcredzero · on June 23, 2010

Really, it depends on your environment. In Smalltalk, most things are objects. Not surprisingly, it turns out to be easiest to make most things objects. I find that Smalltalk is best when a program is mostly objects, there's a sprinkling of short-ish procedural methods whose workings are hidden by encapsulation, and perhaps a handful of long optimized algorithmic methods.

I suspect that in Self, it's easier to make more things objects. (jk - everything is an object in Self.) Objects aren't quite as easy to use in C++ and Java. The cost is higher, so the opportunities to use objects with a good cost/benefit payoff are fewer. That's all there is to it.

Does this generalize? In most Functional languages, functions are really easy to use, and can be used in flexible and powerful ways. What's the best way to program in them? Why, using functions! Yup, seems to work. Fancy that!

ErrantX · on June 23, 2010

A lot of stuff can work as objects; in fact in many cases an object can be the smallest (or simplest, or most logical etc.) piece of code.

But not everything must be abstracted :)

hboon · on June 24, 2010

In Smalltalk, everything is an object.

infinite8s · on June 23, 2010

Also, functions + closures = objects

jonsen · on June 23, 2010

In some sense that's what the second O says, isn't it?

It's not called Object Programming, but Object-Oriented Programming.

bediger · on June 23, 2010

But isn't that where it goes astray?

If we all called it "Class Oriented Programming" we'd come closer to an accurate name. "Class Oriented Programming" as a name might take away the emphasis on instances, and put emphasis on designing classes.

Or not. There appears to be no bottom to human stupidity.

stcredzero · on June 23, 2010

If we apply a design pattern from Smalltalk, we have Concepts. Every Concept has another concept which is its Misconception. This gives us an infinite regress of misconceptions, unless we can come up with a Metamisconception, which is a concept which is its own erroneous misconception. Then we can implement unbounded stupidity in a system of finite size.

edanm · on June 23, 2010

Tell that to Java developers :)