There is a good deal of background and analysis to understand just how the concept of Occamality applies to the practical details of building programs. But before we get to that, perhaps it would be good to review the benefits we can expect. The core benefit of paring down a program to its bare minimum information content is pretty simple:
To the extent that any semantic concept is repeated anywhere, in any form, in the specification of the program, it is a redundancy that, if discarded, would reduce the cost of implementing the concept, the cost of one or more of the downstream components of the program, and the cost of subsequent modifications.
In other words, an Occamal program is one that is quicker than any other program that does the same thing to build, and is overwhelming the easiest to change, even though you did not think you were building “flexibility” into the program!
Let’s review briefly the lifecycle of software. At the beginning of the process for a whole new program or a change to an existing one, there is the need to create or change embodied in a set of requirements. From there, with variations depending on the programming shops’ methodology, we have roughly specifications (business, functional, high level, low level, etc.), design (various levels of focus and detail), code (perhaps with prototypes, code walk-throughs, etc.), test (various levels, sometimes performed by different groups of people using different tools), document (internal and external), train (the people who operate, administer, install, support and use), learn and use the software. This sequence may be linear or it may have iterative cycles. Once the program is in operation, there is an extended support cycle, with support, maintenance and bug fixing. There is typically a demand for new or altered features; these go through a version of a similar chain from requirements through building and use. It is generally recognized that (1) most efforts to build new programs crash and burn prior to roll-out, and (2) most of the money that is spent on a program is spend during its extended “tail,” namely the support and modify cycle. The difficulty of building brand-new programs increases the extent to which existing programs are modified, sometimes through multiple technology cycles.
Today, because occamality is not a familiar concept, it may take longer in practice to build programs that are occamal. Even when it is understood, redundancy frequently does not cost much to build into a program. In fact, it may be quicker to build a program with extensive redundancy (think copy and modify) than with less. But every superficial gain from redundancy is paid for, over and over again. Once we’re past the learning curve, building occamal programs should cost about the same as non-occamal ones (ideally, they would cost less to build), and the majority of the cost of software ownership should experience dramatic benefits. The benefits would mostly be due to the lower cost and lower risk of testing, documenting, learning, using, maintaining, and changing programs.
What exactly is it that makes programs hard to change? Once you understand basically what you have to do, it comes down to finding all the places in the program that will be affected by your change. The side-effects are always the toughest.
What if the change you wanted to make could be accomplished by changing exactly one place in the program? What if, to take a trivial example, you wanted to change the classic
Print (“Hello, world”);
By printing “bubba” instead of “world?” This would be easy, because you could just go to the single program line and make the change.
Now let’s take a well-known horrible example – the Y2K problem. This was the problem of modifying programs so they would continue to work when the year changed from 1999 to 2000. This was a problem because many programs did not use 4 digits to represent the year. For various reasons, usually involving an attempt to save space or time, year was represented as a two digit number in many programs, for example 99, with the leading two digits, 19, understood. If those programs had a single place where the number of digits in a year was represented, solving the Y2K problem would have been trivial – you would just go to each program, find the place where year was defined, change it, re-compile, check for problems, and you’re done. A quick change, low risk, no big deal. So Y2K was a problem not because of anything difficult or mysterious about dates – in fact, it could have been about any similar program change. Y2K was a problem simply and solely because the knowledge of the number of digits in the year of dates was expressed in many ways in many places, and it took a long time to find and fix them all with a substantial risk that some places were missed.
I completely admit, and I remember because I participated in the madness, that there were bizarre variations that made Y2K particularly gruesome. One example is “overloading,” which means that sometimes a programmer would use the value of “99” in the year field to mean “no date was given.” Y2K was a problem before the year 2000 because some programmers used “9999” (which could mean September 9, 1999) to mean “no more records in this sequence.” But all such cases just multiplied the basic fact: the number of digits in a year was expressed in many ways in many places, and in some of those places, other values or indicators were also stored.
So we want to have exactly one place in which the number of digits in date is defined. Notice that doesn’t mean we are limited to a single date. The key concept here is distinguishing between definition and use. Let’s add another level to make it clear. We want to define date in one place. Then we define, say, a couple of dates: birth date, marriage date, date of loan start, date of loan end. Each of these would be defined as types of dates. “Date” would have how many digits are in a year, and “birth date” would be a kind of date. We would define with “date” those things that are true of all dates, like number of digits in the year. We would define in “birth date” only those things that were true of “birth date” in particular, for example, the label to use when displaying it.
This is an elementary concept as I have illustrated it so far. What is different is the extent of the application of the concept that I propose. While it is (I hope) common practice for dates to be defined in this way within a program, it is certainly not common practice for all dates used anywhere in a program for any purpose to be defined in this way, as we saw big time in Y2K. For example, we have one way of defining dates in the database, another one for local storage in application programs, and yet another one for display and reports. While it is reasonable for there to be aspects of dates that some parts of programs care about that others don’t, for example the display label, it is not reasonable that all aspects that are common be defined redundantly – they should be defined exactly once.
More importantly, this concept applies to all aspects of the software lifecycle, not just the building phase. It even applies to documentation, training and using programs, because it means that there is less to document, train and learn, and when changes are made, the user’s common-sense notion that “something changed” actually applies – for example, if zip code is changed from 5 numbers to 5 numbers optionally followed by 4 numbers, that change is universal. The change, made in a single place, ripples though screens, applications, temporary variable definitions and databases.
Occamality directly addresses a lament frequently heard from programmers, about not being given enough time to do a job “right,” to design it for future flexibility. One of the many reasons good managers refuse to support a “good, flexible design” is that you always pay the price of the additional effort, and you rarely actually experience the benefit. That is simply because the designer tries to anticipate the nature of the potential future change, and make provisions for it. But of course, what usually happens is the anticipated change doesn’t happen and something else does, and the fact that the program is longer than it could have been because of the additional code that provides for un-needed flexibility makes the unanticipated change harder to make than it had to be.
Now that we understand what makes change hard – the fact that what we want to change is expressed in many ways in many places, all of which have to be found, understood, and correctly changed – we can design programs that are both as quick as possible to build and as quick to change in the future, regardless of the new requirement; we know that if we need to change something, there’s always one place to go to make the change.
The answer to why you should build Occamal programs is simple:
Occamal programs cost less to specify, design and build (once you are past the learning and acceptance curves), cost much less in the later stages of the software lifecycle where most of the costs are incurred, and enable changes and other forms of support and maintenance to be made with the greatest speed and the lowest cost and risk. And this is true regardless of language, operating system, application environment, and anything else about your software environment and tools.