Category: Computer Fundamentals

  • Classifications of Software Programs

    Anyone who has programmed for a while knows that all code is not alike. There are important broad categories of code. I wish there were a generally agreed to categorization, but there doesn’t seem to be. Many years ago, there were these categories (from lesser to greater in terms of perceived expertise and prestige):

    • Operator (your hands actually touched a physical computer!) – low status!
    • Applications programmer (many variations here, from business analyst to debugging and maintenance). And even they are not all the same! While I was working in a COBOL shop, I discovered that there is an intense, sharp difference among COBOL programmers:
      • Batch COBOL. There is no direct human interaction here.
      • Interactive COBOL, the kind that paints the infamous “green screens.”
      • Guess which is more prestigious?
    • Systems programmer (these got the problems when the application programmers were stumped)
    • Computer vendor employee (these people wrote the operating systems, compilers, and other tools that systems programmers “maintained = got blamed for” at user sites)

    When I worked at a computer vendor, there were clear distinctions among those who were concerned with:

    • The operating system kernel (and then there are the drivers, the people who deal with them have to suffer from hardware dependencies)
    • The operating system utilities
    • Compilers
    • Database systems (“simple” file systems fell under quasi-core OS; add a serious index or mention “b-tree” and you moved here)
    • Networks

    These distinctions were necessary and real. The coding disciplines, received knowledge, experience, and so on were different enough for these things that people could really be good at one and not at another. Unlike in the dreaded and despised world of end users, there was no clear hierarchy. While the kernel guys felt themselves at the center of the world, as in some sense they were, there was no denying the fact that compilers involved all sorts of special knowledge, and this got really annoying once OS’s started getting written in languages requiring compilers.

    I suspect there are other reality-based classifications of code. But for these purposes, I would like to break all programming into some categories that cut across these and others I have heard of:

    • True algorithms
    • Abstract realizations of generic functions
    • Application and device specific definitions and logic

    Algorithms

    Algorithms are the blocks of code studied so well in Knuth’s multi-volume series, The Art of Computer Programming. They are things like sorting, searching, managing lists, etc. There are lots of other sources, but he focuses on the key factors for code that implements algorithms. In spite of the impression that may be given by his many volumes, there are a relatively small number of algorithms, and it is important that a relatively small number of people like Knuth study them intensely, understand everything about them, and make them available (I’ll gladly pay!) in standard, canonic, guaranteed implementations. Most of the metrics in this paper are not relevant to algorithms.

    Why is this? Because, for most algorithms, we continue to value the amount of space they use and the amount of time they take to execute. In increasing numbers of business applications, sub-optimal algorithms could be used without ill effect, at least in some places. But if an optimal algorithm is available, why not use it? Why make life complicated (not to mention error-prone) by introducing two classes of algorithms, the optimal ones and the good-enough ones – what’s to gain?

    The life and analysis of algorithms in practice is by no means complete. There are important advances to be made. But most algorithms are advanced by someone solving a previously unsolved problem (like Dijkstra and the shortest path) or by taking advantage of a newly altered constraint (disk memory is now much cheaper than it used to be, therefore we can reasonably do X, which was a terrible idea in the days of expensive disk). Because of their different goals (minimization of space and time within constraints), algorithms play by different rules than other programs.

    Abstract generic functions

    Abstract generic functions implement things useful to many applications, but not specific to any one of them. Application “frameworks” are a mother lode of such functions. The Rails framework in Ruby is a reasonable modern example.

    Any programmer who has written a variety of programs knows what I’m talking about here, and has his own candidates for inclusion. This has its origins in the FORTRAN FORMAT statement from Paleolithic times. Major categories of software products have emerged that started as application-specific code, moved on to be abstract generic functions, and then finally hardened into products. CRM and Content Management systems are recent examples. However, software products as currently typically designed and delivered aren’t nearly as useful or flexible as unified collections of abstract generic functions. The closest thing we have is application frameworks.

    Application-specific definitions and logic

    This is where everything else goes. This is the code and meta-data that makes your application your application, different from others.

    Even this can and should be broken into layers of generality! For example, there should be one place for the common aspects of your UI design, and one place that defines what is unique to each particular interface.

    The important thing is that, to the largest extent possible, you are mostly selecting and marshaling the general mechanisms available, and making them particular to the problem at hand.

    Of course, your problem may have some highly unique elements. These elements may involve writing some serious code. The code may resemble in depth and complexity that of abstract generic functions. But you’d better just write it to solve the problem at hand, and leave generalization to another day.

  • Summary: Software Fundamentals

    You might think that computer fundamentals would be the first things taught and often returned to in Computer Science. To the contrary, some of the most important things about software may be known to programmers if you ask them, but understood and practiced? Not so much.

    https://blackliszt.com/2012/02/fundamental-concepts-of-computing.html

    https://blackliszt.com/2012/04/where-does-software-come-from-what-is-software-.html

    What do you do with software — aside from wasting huge amount of time? You automate!

    https://blackliszt.com/2020/01/the-fundamentals-of-computer-automation.html

    In any field you can think of, there are clear, agreed-upon measures of success. In software? Not so much.

    https://blackliszt.com/2013/03/software-postulate-the-measure-of-success.html

    If you have trouble measuring success, how can you measure productivity?

    https://blackliszt.com/2012/09/what-is-software-productivity.html

    If you have trouble measuring productivity, practices can become widely used that, if you step back from them, you can see obviously reduce productivity.

    https://blackliszt.com/2012/09/software-productivity-divisors.html

    Programming is taught as though all programming is similar. Nothing could be further from the truth. Is arguing a criminal case in a courtroom the same as acting in a Shakespeare play? They both use English, right? Programming has distinctions that are just as sharp.

    https://blackliszt.com/2023/09/classifications-of-software-programs.html

    One of the most obvious but least examined fundamental is that software is data. The fact that there is a memory that contains a mix of instructions and data … and that the instructions are data … is fundamental, with many implications.

    https://blackliszt.com/2014/04/fundamental-concepts-of-computing-software-is-data.html

    Diving into and understanding the possible relationships between data and instructions is bedrock core to being a real programmer.

    https://blackliszt.com/2021/08/the-relationship-between-data-and-instructions-in-software.html

    Everyone knows about numbers and counting. But how many times do people stand back from their work and apply some appropriate numbers to them? Not often enough.

    https://blackliszt.com/2012/10/fundamental-concepts-computing-counting.html

    Closely related to counting is speed. The speed of computing has increased unlike anything else in our existence. If cars got 10X faster, we would change lots of things, including whether to drive or fly. But as computing has gotten 100X or more faster, we too often fail to adapt the way we program.

    https://blackliszt.com/2013/12/fundamental-concepts-of-computing-speed-of-evolution.html0

    When we walk or drive, we always look ahead and change details of our walking or driving to adapt to ever-changing conditions. Adapting what you do to what you see and learn is fundamental common sense. All too often, it is not part of programming.

    https://blackliszt.com/2013/02/fundamental-concepts-of-computing-closed-loop.html

    Some fundamental concepts of software aren’t taught and most people, including programmers, have the wrong ideas in their head about them. One of those concepts is that a program is a passive thing that needs to be built, instead of a set of actions that needs to be choreographed.

    https://blackliszt.com/2021/08/fundamental-concept-software-is-not-a-passive-thing-it-is-actions.html

    Flowcharts and workflow express the actions of a program. They have been around since the start of programming. They're still around, but largely ignored as a fundamental.

    https://blackliszt.com/2022/06/flowcharts-and-workflow-in-software.html

    You think of building a house, after which it’s mostly unchanged. Software is different. Being built is like being born – software spends most of its “life” being changed.

    https://blackliszt.com/2013/10/when-is-software-development-done.html

    https://blackliszt.com/2021/08/fundamental-concept-software-is-rarely-built-mostly-it-is-changed.html

    What we call “building” software works much better if we treat as “growing a baby.”

    https://blackliszt.com/2011/06/software-how-to-move-quickly-while-not-breaking-anything.html

    It’s easy to think that most of software has been figured out, and that innovation is possible only at the nether reaches of the field. Kind of like how innovation in math isn’t needed or performed from arithmetic through algebra and calculus and way beyond. Not true in software! Some of the most fundamental things, like quality, are ripe for fundamental innovation.

    https://blackliszt.com/2015/06/fundamental-innovations-in-software.html

    Speaking of math, computing started as a way to do math and grew as part of math departments. The misconception that math and Computer Science are closely related is widespread. This is deeply harmful.

    https://blackliszt.com/2015/05/math-and-computer-science-vs-software-development.html

    https://blackliszt.com/2015/04/math-and-computer-science-in-academia.html

    The usual practice in computing is to dive right into the details. Perspective and putting it in a broader context is tremendously helpful if you aspire to be truly excellent in computer software. Here are some sample books that provide that context.

    https://blackliszt.com/2023/05/inspiring-books-for-writing-software.html

    Computing and software would greatly benefit by establishing the fundamental concepts of the field. Individual programmers can become vastly more productive by understanding the fundamentals and practicing them.

  • Fundamental Concept: Software is not a passive thing, it is actions

    Even lay people have heard that there are computer languages. They've heard that the languages process data — they take data in and put it out. They gather there are commands in the language to do stuff to data. So of course software is actions.

    In spite of this knowledge, something very different is in the forefront of peoples' thinking about software, both in the industry and not. This different way of thinking results from the fact that software is invisible to nearly everyone, and that in order to think and communicate about it, language is used that assumes software is a thing.

    We talk about a body of software, the 2021 version of a piece of software, as though it were a car. Speaking of cars, we talk about how software crashes. We talk about how software has bugs, a phrase that originated from a bug found in a long-ago computer that had broken.

    We talk about creating requirements for a new piece of software that someone will architect and the building may be put out to bid. The rest of the software building process sounds remarkably like building a house. We think of a time when the building project is done and we can start using the house.

    The trouble is that a house just sits there. It’s a passive physical object. Software is NOT a physical object; it’s mostly is a set of actions to be taken by a computer, along with some built-in data. A piece of software is like a data factory – it takes data in, makes all sorts of changes, sometimes puts some of it in a storage room, and spits out data in all sorts of volumes and formats. A software program responds to each piece of data it gets, e.g. mouse click or data file; software is an actor, while a house is a thing that does nothing. Designing a piece of software is more like writing a play for an improv theater – instead of having a fixed script, the play can go in all sorts of directions depending on the input it gets from the audience. The improv theater script even has to tell the actor that when an audience member uses a swear word, the actor has to respond saying that those words aren’t permitted, would he like to try again. This is NOTHING like designing a house! Here is more detail.

    Suppose you use an advanced architecture program and design a house in great detail. Modern programs go down to the level of the framing, the details of the steps, doors and windows, literally every object that will be part of the eventual house. When you're done with such a design, you can give it to the building department for approval and to a construction company to build. If they follow the instructions correctly, you'll get exactly what you saw in the 3-D renderings and the live-action walk-throughs. Here's the killer: the equivalent of such a detailed design for computer software is … a computer program! Or at least enough detail in pseudo-code, data definitions and screen designs so that coding is fairly mechanical. A detailed design for a computer program is like the results of extensive interactions by the script-writing team for a new episode of a series TV show, with recordings and lots of notes. Chances are a junior script-writer who has been intensely following everything with some participation is assigned the job of producing a clean draft of the script, ready for refinement and editing. That is the program, ready for actors to give it a studio reading, like you would run a test version of the software for the developers.

    Software isn't a thing, regardless of how often we talk about it as though it were. It isn't a thing, even though we nearly always use methods like project management developed for building things for organizing its creation. Software is a collection of actions, like an incredibly detailed script for improvisational theater with loads of conditional branching and the ability to bring things up that were talked about before in creative and useful ways.

    This matters because how you think influences what you do. If you're thinking about running a marathon while you're climbing a mountain, chances are you'll slip and fall. If you're thinking about baseball plays while you're trying to choreograph a ballet, I suspect you won't win any awards. But at least those are thinking about actions. If you're thinking about the design of a book while you're writing a comedy, I doubt you'll get many laughs.

    This fundamental concept is a core aspect of all of programming. The fact that people so often have the wrong things in their heads while they are dealing with software has profound effects.

    And … just as bad, we mostly don't "build" software — mostly we change it!

  • The Relationship between Data and Instructions in Software

    The relationship between data and instructions is one of those bedrock concepts in software that is somehow never explicitly stated or discussed. While every computer program has instructions (organized as routines or subroutines) and data, the details of how the data is identified, named and accessed vary among programming languages and software architectures. Those differences of detail have major consequences. That’s why understanding the underlying principles is so important.

    Programmers argue passionately about the supposed virtues or defects of various  software architectural approaches and languages, but do so largely without reference to the underlying concepts. Only by understanding the basic concepts of data/instruction relationships can you understand the consequences of the differences.

    The professional kitchen

    One way to understand the relation between instructions (actions) and data is to compare it to something we can all visualize. An appropriate comparison with a program is a professional chef’s kitchen with working cooks. But in this kitchen, the cooks are a bit odd — only one of them is active at a time. When someone asks them to do something they get active, each performing its own specialty. A cook may ask another cook for help, giving the cook things or directing them to places, and getting the results back. The cooks are like subroutines in that they get called on to do things, often with specific instructions like “medium rare.” The cook processes the “call” by moving around the kitchen to fetch ingredients and tools, brings them to a work area to process the ingredients (data) with the tools, and then delivers the results for further work or to the server who put in the order. The ingredients (data) can be in long-term storage, working storage available to multiple chefs or in a workspace undergoing prep or cooking. In addition, food (data) is passed to chefs and returned by them.

    The action starts when a server gives an order to the kitchen for processing.

    • This is like calling a program or a subroutine. Subroutines take data as calling parameters, which are like the items from the menu written on the order to the kitchen.

    The person who receives the order breaks the work into pieces, giving the pieces to different specialists, each of whom does his work and returns the results. Unlike in a kitchen, only one cook is active at a time. One order might go to the meat chef and another to whoever handles vegetables.

    • This is like the first subroutine calling other subroutines, giving each one the specifics of the data it is supposed to process. The meat subroutine would be told the kind and cut of meat, the finish, etc.

    In a professional kitchen there is lots of pre-processing done before any order is taken. Chefs go to storage areas and bring back ingredients to their work areas. They may prepare sauces or dough so that everything is mixed in and prepped so that it can be finished in a short amount of time. They put the results of their work in nearby shelves or buckets for easy access later in the shift.

    • This is like getting data from storage, processing it and putting the results in what is called static or working storage, which is accessible by many different subroutines.

    There is a storage area and refrigerator that stores meat and another that stores vegetables. The vegetable area might have shelves and bins. The cook goes to the storage area and brings the required ingredients back to the cook’s work space. Depending on the recipe, the cook may also fetch some of the partly prepared things like sauces, often prepared by others, to include.

    • This is like getting data from long-term storage and from working storage and bringing it to automatic or local variables just for this piece of work.

    The storage area could be nearby. It could be a closet with shelves containing big boxes that have jars and containers in it. A cook is in charge of keeping the pantry full. They go off and get needed ingredients and put them in the appropriate storage area as needed. They could also deliver them as requested right to a chef.

    • This is like having long-term storage and access to it completely integrated with the language, or having it be a separate service that needs to be called in a special way.

    The chef does the work on the ingredients to prepare the result.

    • This is like performing manipulations on the data that is in local variables until the desired result has been produced. In the course of this, a chef may need to reach out and grab some ingredient from a nearby shelf.

    The chef may need extra space for a large project. He grabs some empty shelves from the storage area and uses them to store things that are in progress, like dough that needs time to rise. later a chef might call out “grab me the next piece of dough” or “I need the dough on the right end of the third shelf.

    • This is like taking empty space and using it. Pointers are sometimes used to reference the data, or object ID’s in O-O systems.

    The cook delivers the result for plating and delivery.

    • This is like producing a return variable. It may also involve writing data to long-term storage or working storage.

    I’m not a cooking professional, but I gather that the work in professional kitchens and how they’re organized has evolved towards producing the best results in the least amount of elapsed time and total effort. As much prep work as done before orders are received to minimize the time and work to deliver orders quickly and well. The chefs have organized work and storage spaces to handle original ingredients (meat, spices, flour, etc.) and partly done results (for example, a restaurant can’t wait the 45 minutes it might take to cook brown rice from scratch).

    In the next section, this is all described again somewhat more technically. If you’re interested in technology or have a programming background, by all means read it. The main points of this post and the ones that follow can be understood without it.

    Instructions and data in a computer program

    The essence of a computer program is instructions that the computer executes. Most of the instructions reference data in some way – getting data, manipulating it, storing results. See this for more.

    In math, from algebra on up, variables simply appear in equations. In computer software, every variable that appears in a statement must be defined as part of the program. For example, a simple statement like

              X = Y+1

    Means “read the value stored in the location whose name is Y, add the number 1 to it, and store the result in the location whose name is X.” Given this meaning, X and Y need to be defined. How and where does this happen? There are several main options:

    • Parameters. These form part of the definition of a subroutine. When calling a subroutine, you include the variables you want the subroutine to process. These are each named.
    • Return value. In many languages, a called routine can return a value, which is defined as part of the subroutine.
    • Automatic or local variables. These are normally defined at the start of a subroutine definition. They are created when the subroutine starts, used by statements of the subroutine and discarded when the subroutine exits.
    • Static or working storage variables. These are normally defined separately (outside of) subroutines. They are assigned storage at the start of the whole program (which may have many subroutines), and discarded at the end.
    • Allocated variables. Memory for these is allocated by a subroutine call in the course of executing a program. Many instances of such allocated variables may be created, each distinguished by an ID or memory pointer.
    • File, database or persisting variables. These are variables that exist independent of any program. They are typically stored in a file system or DBMS. Some software languages support these definitions being included as part of a program, while others do not. See this for more.

    There are a couple concepts that apply to many of the places and ways variables can be defined.

    • Grouping. Groups of variables can be in an ordered list, sometimes with a nesting hierarchy. This is like the classic Hollerith card: you would have a high-level definition for the whole card and then a list of the variables that would appear on the card.
      • There might be subgroups; for example, start-date could be the name of a group consisting of the variables day, month, year.
      • Referring to such a variable might look like “year IN start-date IN cust-record” in COBOL, while in other languages it might be year.start-date.cust-record.
    • Multiples. Any variable or group can be declared to be an array, for example the variable DAY could be made an array of 365, so there’s one value per day of a year.
    • Types or templates. Many languages let you define a template or type for an individual variable or group. When you define a new variable with a new name like Y, you could say it’s a variable of type X, which then uses the attributes of X to define Y.
    • Definition scope. Parameters, return values and local variables are always tied to the subroutine of which they are a part. They are “invisible” outside the subroutine. The other variables, depending on the language, may be made “visible” to some or all of a program’s subroutines. Exactly how widely visible data definitions are is the subject of huge dispute, and is at the core of things like components, services and layers.

    When you look at a statement like X = Y+1, exactly how and where X and Y are defined isn’t mentioned. X could be a parameter, a local variable or defined outside of the subroutine in which the statement appears. Part of the job of the programmer is to name and organize the data definitions in a clean and sensible way.

    The variety of data-instruction organization and relationships

    Most of the possibilities for defining variables listed above were provided for by the early languages FORTRAN, COBOL and C, each of which remains in widespread use. Not long after these languages were established, variations were introduced. Software languages and architectures were created that selected and arranged the way instructions related to data definition. Programmers and academics decided that some ways of referencing and organizing data were error-prone and introduced restrictions that were intended to reduce the number of errors that programmers made when creating programs. In software architecture, the idea arose that all of a program's subroutines should be organized into separate groups, usually called "components" or "services," each with its own collection of data definitions. The different components call on each other or send messages to ask for help and get results, but can only directly operate on data that is defined as part of the component.

    The most extreme variation of instruction/data relationship is a complete reversal of point of view. The view I've described here is "procedural," which means everything is centered around the actor, the chef who does things. The reversal of that point of view is "object-oriented," called O-O, which organizes everything around the data, the acted upon, the ingredients and workspaces in a kitchen. Instead of following the chef around as he gets and operates on ingredients (data), we look at the data, called objects, each of which has little actors assigned to it, mini-chefs, that can send messages for help, but can only work on their own little part of the world. It's hard to imagine!

    The basic idea is simple: instead of having a master chef or ones with broad specialties like desserts, there are a host of mini-chefs called "methods," each of  which can only work on a specific small group of ingredients. A master chef has to know so much — he might make a mistake! By having a mini-chef who is 100% dedicated to dough, and never letting anyone else create the dough, we can protect against bad chefs (programmers) and make sure the dough is always perfect! Hooray! Or at least that's the theory…

    Conclusion

    Computers take data in, process it and write data out. Inside the computer there are instructions and data. Software languages have evolved to make it easier for programmers to define the data that is read and created and to make it easier to write the lines of code that refer to the data and manipulate it. As bodies of software have grown, people have created ways to organize the data that a computer works on, for example putting definitions for the in-process data of a subroutine inside the subroutine itself or collecting a group of related subroutines into a self-contained group or component with data that only it can work on.

    Understanding the basic concepts of instruction/data relationships and how those relationships can be organized and controlled is the key to understanding the plethora of approaches to language and architecture that have been created, and making informed decisions about which language and architecture is best for a given problem. The overall trend is clear: Programming self-declared elites decide that this or that restriction should be placed on which variables can be accessed by which instructions in which way, with the goal of reducing the errors made by normal programming riff-raff. Nearly all such restrictions make things worse!

  • Fundamental Concept: Software is rarely Built, mostly it is Changed

    Nearly everything we hear about the construction of software is that it is built. This is a fundamental misconception that pervades academia and practice. This misconception is a major underlying cause of why software has so many problems getting created in a timely way and functioning with acceptable speed and quality.

    The software-is-built misconception is woven into every aspect of software education and practice. You are first taught how to write software programs; all the exercises start with a blank screen or sheet of paper. You’re taught how to test and run them. In software engineering you’re taught how to perform project management, from requirements to deployment. You may learn how to automate software quality testing. Everything about these processes assumes that software is built, like a house or a car.

    The idea that software is built isn’t wrong – it just applies to a tiny fraction of the total effort that goes in to a body of software from birth to its final resting place in the great bit-bucket in the sky. The vast majority of the effort goes into changing software – making it do something different than the version you’re working on does. Maybe you’re fixing a bug, maybe you’re adding a feature or adapting to a new environment. Sometimes the changes are extensive, like when you’re changing to a new UI framework. For a typical  body of software, however much effort is put into the first production version, a minimum of 10X and often over 100X of the work goes into changing it.

    We don't think of software as ever-changing, ever-growing partly because we think in terms of physical metaphors. We use the same project management techniques that are used for houses that we use for software. While sometimes houses are changed, mostly they're just built and then used for years. It's an extremely rare house that is endlessly extended the way software is. See this for more.

    When it comes to enterprise software, the fraction of effort that goes into change sky rockets even more. A good example is the kind of software that runs hospitals and health systems. The software already exists, and it’s already had 100X the effort put into change than the original building. For example, standard software from the leading industry supplier was installed in the NYC hospital system. The original contract — just to buy and install existing production software — was for over $300M and 4 years! A couple years later the carefully planned project cost more than doubled. All because of changes. Here are details.

    Why does thinking software is built rather than changed matter?

    The idea that software is about building and not changing is responsible for most of the processes, procedures and methods that are overwhelmingly common practice either by custom or by regulation. The mismatch between the practices and the fact that software is nearly always changed instead of being created in the first place results in the never-ending nightmare of software disasters that plague us. The never-ending flow of new tools, languages and methods are futile attempts to solve these problems. No one would find the new things attractive if there weren’t a problem to fixed – a problem everyone knows and that no one wants to explicitly discuss!

    Example: Project Management, Architecture, Design and Coding

    Mostly what happens when you get a job in software is that you join a group that works with a substantial existing body of code. You may be a project manager, a designer, a customizer, any number of things, all of which revolve around the body of code.

    Yes, you have to start any project from an understanding of what you want to accomplish. Here's the thing: what you want to get done is in the context of the existing code. What this means is that you have to understand as much of the code as possible to do things. If what you want to do is fairly circumscribed in the existing code, you may get away with ignoring the rest — which is where comprehensive, nothing-left-out regression testing comes in, which means better make darn sure you didn't break anything.

    In this context, lots of skills are relevant. But the overwhelming number one thing is … knowledge of the ins and outs of the existing code base!

    I've worked in a couple places where there were large code bases of this kind. Forget about departmental hierarchy — the go-to person in every case of doing anything important was the person who best knew that particular aspect or section of the code.

    In this context, what's the point of all the fancy project management stuff people think is the cornerstone of effective, predictable work? When you're traveling through an unknown land with lots of mountains, jungles, rivers and swamps, what do you want above all else (besides nice paved roads that don't exist)? You want an experienced GUIDE, someone who's been all over the place and knows the best way to get there from here.

    Kind of turns all the project management stuff on its head, doesn't it?

    Example: Software QA

    Most software QA revolves around writing test scripts of one kind or another that mirror the new code you write: if the code does X, the test should assure that the code does X correctly. This is true whether you're doing test-driven development or whether QA people write scripts. As you build the code, you're supposed to build the test code. When you make a new release, you run integration tests that include running all the individual test scripts.

    Everyone with experience knows the problems: there are never enough tests, they don't cover all the conditions, there are omissions and errors in the tests, with the result that new releases put into production often have problems.

    The root cause of this is the never-spoken assumption that software is about building, when the reality is that it's mostly changed. What this means is that the focus should be on the existing code and what it does — do the changes that you made break anything that used to work? When you make changes, you should be thinking champion/challenger like in data science: the new code is the challenger: did it make things better, and did it (probably unintentionally) make anything worse? The best way to do this is to run the same data through the production program and the challenger program in parallel, and have a tool that compares the output. Looking at the output will tell you what you want to know. Even better, running the new code in live-parallel-production, with lots of real customer data going through both versions but only the current champion output going to the customer provides a safe way to see if the customers are OK, while testing out the new functionality on the champion code only to make sure it does what you wanted.

    Who needs test scripts, TDD and the rest? They're backwards! See this for more.

    Conclusion

    Software is literally invisible to the vast majority of people. Therefore, we all talk and think about software in terms of metaphors. That would be OK if the metaphors matched reality; unfortunately, the metaphors don’t match reality, and play a major role in software dysfunction. In real life, we build houses, bridges, and other structures — we mostly don't change them.

    So what are the methods that are appropriate when you recognize that the main thing you do with software is change it instead of building it? I've done my best to abstract the methods I've seen the best people use, calling them (for want of a better term) war-time software. Instead of creating a whole adult human at once, like Frankenstein, it's like starting with a baby that can't do much yet, but is alive. The software effort is one of growing the baby instead of constructing a Frankenstein.

     

  • Software programming languages: the Declarative Core of Functional Languages

    What is most interesting about functional languages is that they strive to be declarative, instead of the imperative orientation of programming languages.

    In a prior post I described the long-standing impulse towards creating functional languages in software. Functional languages have been near the leading edge of software fashion since the beginning, while perpetually failing to enter the mainstream. There is nonetheless an insight at the core of functional languages which is highly valuable and probably has played a role in their continuing attractiveness to leading thinkers. When that insight is applied in the right situations in a good way it leads to tremendous practical and business value, and in fact defines a path of advancement for bodies of software written in traditional ways.

    This is one of those highly abstract, abstruse issues that seems far removed from practical values. While the subject is indeed abstract and abstruse, the practical implications are far-reaching and result in huge software and business value when intelligently applied.

    Declarative and Imperative

    A computer program is imperative. It consists of a series of machine language instructions that are executed, one after the other, by the computer's CPU (central processing unit). Each Instruction performs some little action. It may move a piece of data, perform a calculation, compare two pieces of data, jump to another instruction, etc. You can easily imagine yourself doing it. Pick up what's in front of you, take a step forward, if the number over there is greater than zero, jump to this other location, etc. It's tedious! But every program you write in any language ends up being implemented by imperative instructions of this kind. It's how computers work. Period.

    Computers operate on data.  Data is passive. Data can be created by a program, after which it's put someplace. The locations where data is held/stored are themselves passive; those locations are declared (named and described) as part of creating the imperative part of a computer program.

    Data is WHAT you've got; instructions are HOW you act. WHAT is declarative; HOW is imperative. WHAT is like a map; HOW is like an ordered set of directions for getting from point A to point B on a map.

    The push towards declarative

    From early in the use of computers, some people saw the incredible detail involved in spelling out the set of exacting instructions required to get the computer to do something. A single bit in the wrong position can cause a computer program to fail, or worse, arrive at the wrong results. This detailed directions approach to programming is wired into how computers work. Is there a better way?

    There is in fact no avoiding the imperative nature of the computer's CPU. As high level languages began to be invented that freed programmers from the tedious, error-prone  detail of programming in machine language, some people began to wonder if there were a way to write a low-level program, a program that of necessity would be imperative, that somehow enabled programs to be created in some higher level language that were declarative in nature.

    Some people who were involved with early computers, nearly all with a strong background in math, proceeded to create the declarative class of programming languages, the most distinctive members of which are functional programming languages as I described in an earlier post.

    The ongoing attempt to create functional languages and use them to solve the same problems for which imperative languages are used has proven to be a fruitless effort. But there are specific problem areas for which a declarative approach is well-suited and yields terrific, practical results — so long as the declarative approach is implemented by a program written in an imperative language, creating a workbench style of system for the declarations. The current fashion of "low code" and "no code" environments is an attempt to move in that direction. But I'd like to note that there's nothing new in those movements; they're just new names for things that have been done for decades.

    The Declarative approach wins: SQL

    DBMS's are ubiquitous. By far the dominant language for DBMS is SQL. Data is defined in a relational DBMS by a declarative schema, defined in DDL (data definition language). Data is operated on by a few different kinds of statements such as SELECT and INSERT.

    SELECT certainly sounds like a normal imperative keyword in any language, like COBOL's COMPUTE statement. But it's not. It's declarative. A SELECT statement defines WHAT data you want to select from a particular database, but says nothing about HOW to get it.

    This is one of the cornerstones of value in a relational DBMS system. A SELECT statement can be complicated, referencing columns in multiple rows of various tables joined in various ways. The process of getting the selected data from the database can be tricky. Without query optimization, a key aspect of a DBMS, a query could take thousands of times longer than a modern DBMS that implements query optimization will take. Furthermore, table definitions can be altered and augmented, and so long as the data referenced in the query still exists, the SELECT statement will continue to do its job.

    If all you're doing is grabbing a row from a table (like a record from an indexed file), SQL is nothing but a bunch of overhead, and you'd be better off with simple 3-GL Read statements. But the second things get complicated, your program will require many fewer lines of complex code while being easier to write and maintain if you have SQL at your disposal. A win for the declarative approach to data access, which is why, decades after it was created, SQL is in the mainstream.

    The Declarative approach wins: Excel

    I don't know many programmers who use Excel. Too bad for them; it's a really useful tool for many purposes, as its widespread continuing use makes clear.

    Excel is a normal executable program written in an imperative language, but it implements a declarative approach to working with data. Studying Excel is a good way to understand and appreciate the paradigm.

    An Excel worksheet is two dimensional matrix of values (cells). A cell can be empty or have a value of any kind (text, number, currency, etc.) entered. What's important in this context is that you can put a formula into a cell that defines its value. The formula can reference other cells, individually or by range, absolutely or relatively. A simple formula could be the sum of the values in the cells above the cell with the formula. It could be arbitrarily complex. It can have conditionals (if-then-else). Going beyond formulas, you can turn ranges of cells into a wide variety of graphs.

    If you're not familiar with them, you should look at Pivot tables, and when you've absorbed them, move on to the optimization libraries that are built in, with even better ones available from third parties. Pivot tables enable you define complex summaries and extractions of ranges of cells. For example, I have an Excel worksheet in which I list each day of the year and the place where I am that day. A simple Pivot table gives me the total of days spent in each location for the year, something that simple formulas could not compute.

    The key thing here is that even though there are computations, tests and complex operations taking place, it's all done declaratively. There is no ordering or flow of control. If your spreadsheet is simple, Excel updates sums (for example) when you enter or change a value in a cell. For more complex ones, you just click re-calc, and Excel figures out all the formula dependencies and evaluates them in the right order. This makes Excel a quicker way to get results than programming in any imperative language, assuming the problem you have fits the Excel paradigm.

    The Declarative approach wins: React.js

    One of the most widely-used frameworks for building UI's is React.js. Last time I looked, the header page of react.js included this:

    1

    It says right out that it's declarative! And that makes it easy! I have found a number of places (example, example) that have nice explanations of how it works and why it's good.

    The Declarative approach wins: Compilers

    The best computer-related course I took in college by far was a graduate course on the theory and construction of compilers. The approach I was taught all those decades ago remains the main method for compiler construction. First you have a lexical parser to turn the text of the program into a string of tokens. Then you have a grammar parser to turn the tokens into a structured graph of semantic objects. Then you have a code generator to turn the objects into an executable program.

    The key insight is that each stage in this process is driven by a rich collection of meta-data. The meta-data contains all the details of the input language and its grammar and the output target. A good compiler is really a compiler-compiler, i.e., the imperative code that reads and acts on the lexical definitions, the grammar and the code generation tables. The beauty is that you write the compiler-compiler just once. If you want it to work on a new language, you give it the grammar of the language without changing any of the imperative code in the compiler-compiler! If you want to generate code for a new computer for a language you've already got, all you do is update the code generation tables! Once you've got such a compiler, you can write the compiler in its own language, at the start "boot-strapping" it.

    While they didn't exist at the time I took the course, tools to perform these functions, YACC (Yet Another Compiler Compiler) and LEX, were built at Bell Labs by one of the groups of pioneers who created Unix and the C language. I didn't have those tools but used the concepts to build the FORTRAN compiler I wrote in 1973 while at my first post-college job. Since the only tool I had to use was assembler language, taking the meta-data, compiler-compiler approach saved me huge amounts and time and effort compared to hard-coding the compiler without meta-data.

    This is meta-data at its best.

    The Declarative spectrum

    Excel and SQL are 100% all-in on the declarative approach. But it turns out that it doesn't have to be all one way or the other. If you're attacking an appropriate problem domain, you can start with just programming it imperatively, and then as you understand the domain, introduce an increasing amount of declaration into your program, by defining and using increasing amounts of meta-data. This is exactly what I have described as climbing up the tree of abstraction. Each piece of declarative meta-data you introduce reduces the size of the imperative program, and moves information that is likely to be changed or customized into bug-free, easily-changed meta-data.

    Conclusion

    Functional languages as a total alternative to imperative languages will perpetually be attractive to some programmers, particularly those of a math theory bent. Except in highly limited situations, it won't happen. No one is going to write an operating system in a functional language.

    Nonetheless, the declarative approach with its emphasis on declaring facts, attributes and relationships is the powerful core of the functional approach, and can bring simplicity and power to otherwise impossibly complicated imperative programs.

  • Fundamental Concepts of Computing: Subroutine Calls

    Subroutine calls are glossed over as yet another thing to learn when software is taught or described. The fact is that subroutine calls are one of the most amazing, powerful aspects of computer programming. They are the first step up the mountain of abstraction, the path towards creating software that solves problems with optimal efficiency, effectiveness and ease.

    What's a subroutine? In common-sense terms, it's like a habit. I hope you have the habit of brushing your teeth. Calling a subroutine is like deciding you should brush your teeth. The subroutine itself is like the steps you take to brush your teeth: open your medicine chest, get out your toothbrush and toothpaste (probably always in the same order), etc. When you're done (you've rinsed your mouth and put stuff away for the next time) the subroutine is done and it "returns" to whatever called it — what was I doing when I decided it was time to brush my teeth, you might think, but a computer subroutine always returns to exactly where it was before making the subroutine call.

    In computer subroutines this calling behavior can get more elaborate. Subroutines can call (start) other subroutines, which can call yet others, without end. This is called "nested" calling.

    The teeth cleaning subroutine ends with (we hope) clean teeth. Computer subroutines can end with (return) lots more. Suppose you're researching political revolutions and are making lots of notes. Then you decide it's important to study the 1917 Russian revolution. You put your general revolution notes to the side and dive into Russia. You start taking notes and then realize how important the Menshiviks and Bolshiviks are. You put the notes you've taken so far on 1917 on top of the general notes and dive into the next subject. When you've gone as deep down the rabbit hole as you can stand, you take the top notes from your stack on the side, and fill in the notes of what you've just learned. Eventually you pick up the next notes from the stack and so on until you're back to your revolution notes. It's also a bit similar to clicking on a web page, and then on that one and then again, and finally hitting the back arrow until you're back on the page where you started. Each new subject you dive into is like calling a subroutine, from which you can make further calls, eventually returning from each to the place you left off, only now with additional knowledge and information.

    Just like in real life, when you call a subroutine you pass it information (parameters). The information you pass tells the subroutine exactly what you wish done. When the subroutine is done it can pass information back to the caller. You might pass dirty plates to the dishwasher subroutine, for example, which passes clean plates back when it returns. Or more elaborately you could pass a bunch of ingredients to the bake-a-cake routine and it would return a cake.

    Some Technobabble

    Computer software people aren't as clear as you'd like them to be with the names they use. The term I'm using here, "subroutine," is out of fashion these days, though still in use. In different software languages or programming communities a subroutine could be called a routine, subprogram, function, method or procedure.

    Let's start with the basics of programming languages. Here's an overview of the bedrock on which everything is built, machine language, and the higher-level languages that have been built on that foundation. When you look at the memory of a computer what you see is data. It's all one's and zero's. But some of that data can be understood by the machine to be instructions. An instruction may include a reference to a memory location, an address. The instruction could ask that the data at the given address replace the value in a register, be added to it or other supported functions. There is always a "jump" instruction, which instructs the machine not to execute the next sequential instruction as it normally would, but to execute instead the instruction at the address in the instruction. There are test instructions that normally cause a jump if the specified condition is met.

    The Subroutine call

    Amongst the rich jumble of instruction types is one that stands out with amazing power: the subroutine call. A subroutine call, sometimes implemented directly in machine language and sometimes in multiple machine language instructions, does a couple things at once.

    • It saves the address of the location after the subroutine call somewhere, usually on a stack.
    • It puts some parameters someplace, often on a stack.
    • It transfers control to the starting address of the subroutine.

    The subroutine then:

    • Executes its instructions, getting parameters from wherever they were passed to it.
    • Returns to the address after the caller as stored on the stack
    • Optionally gives a "return value" to the calling code.

    Call by value vs. call by reference!

    It's a bit esoteric, but there's an important distinction between two basic ways parameters can be passed to a subroutine: pass by value and pass by reference. Some languages can handle both kinds, while others can only handle one. This sounds abstract but it's important and easy to understand. Passing by value is like when you put the ingredients for a cake into a box, pass the box through an opening to another room, where a baker turns the ingredients into a cake and passes the cake (the value) back through the opening. Passing by reference is like when there's a larger kitchen and one of the people working there is a baker. You ask the baker to bake a cake, and the baker goes into the different storage places, gets the ingredients she needs and turns them into a cake on the spot.  In passing by value the subroutine only gets what you give it. In passing by reference, you tell the subroutine (by address pointers) what it should work with. In the peculiar world of object-oriented thinking, the equivalent of calling by reference is sacrilegious. But that's a subject for another time.

    Subroutine Power

    Subroutines are the first step towards reducing the size and complexity of applications, increasing their level of abstraction, and making them easier to change and enhance with minimal trouble and error. Subroutines are high on the list of fundamental concepts of computing.

  • The Progression of Abstraction in Software Applications

    This post describes a little-known concept for understanding and creating software architecture that small groups use to defeat large, powerful incumbents and nimble competitors. It is one of a small number of powerful, repeating patterns that help us understand and predict the evolution of software. Understanding these patterns can help entrepreneurs direct their efforts; if they do it well, they greatly enhance their chances of success. Understanding the patterns can also help investors choose to invest in groups that are walking a path to success.

    Evolution of Applications Towards Abstraction on a Platform

    One of these patterns is the stages that applications naturally evolve through on a technology platform. Each step or stage brings a big increase in the power of the software, decreasing the effort and increasing the speed and effectiveness of being deployed to meet customer needs.

    A category of software applications may well get “stuck” at a particular stage for a long time, sometimes even decades. During this time, the software may appear to move forward, and of course the marketing people and managers will always put things in the best possible light. But it’s always vulnerable to being supplanted by a next-stage version of the functionality.

    While there aren’t clear lines of delineation between the stages, it’s nonetheless useful to understand them roughly as:

    • Prototype. A hard-coded body of code.
    • Custom Application. Does a job reliably, but most changes require changing source code.
    • Basic Product. The code now has parameters, maybe user exits and API’s. Real-life implementations tend to require extensive professional services, and the cost of upgrading to new versions tends to be high.
    • Parameterized Product. The level of parameterization is high, with interface layers to many things outside the core code, so that many implementations can be done without changing source code. There may be some meta-data or editable rules.
    • Workbench Product. A large portion of the product’s functionality has migrated from code to editable meta-data, so that extensive UI, workflow, interface and functionality changes can be accomplished via some form of workbench, which could be just a text editor. The key is that the details of application functionality are expressed as editable data, meta-data, instead of code. Nonetheless, all fundamental capabilities are expressed in highly abstract code.

    As a body of code goes through this sequence of abstraction, it is increasingly able to meet the needs of new customers and changing needs of existing customers, with decreasing amounts of effort, risk and changes to source code. At the same time, the more abstract a program, the more functionality is expressed as data that is not part of the software itself, a.k.a. meta-data, and the more the software implements generic capabilities, as directed by the meta-data.

    The pattern applies both to individual bodies of code and to collections of them. It applies to code built internally for an organization and to code that is sold as a product in any way.

    I defined the stages above as a convenience; in reality, the categories are rarely hard-and-fast. A body of code could be given a big transformation and leap along the spectrum, or it could take a long series of small steps. One body of code could remain stuck with little change in abstraction, while other bodies of code doing similar things could be ahead, or progress rapidly towards abstraction.

    The Driver of Abstraction Evolution

    In biological nature, competitive pressures and external change appear to drive evolutionary changes. Similarly, when we look at categories of software, if little changes to make the software change, it doesn’t change – why take the trouble and expense to change software that meets your needs or the needs of your customers?

    In reality, someone always seems to want changes to an application. A prospective user would gladly use the software if it did this, that or the other thing. A current user complains about something – it’s too slow, too hard to use, too complicated, or it just doesn’t work for X, Y or Z. How often does a piece of software not have a “roadmap?” If it doesn’t, it’s probably slated for retirement before long. Brand-new software is rare. The vast majority of software effort goes into making changes to an existing piece of software.

    How much time and effort is needed to change a particular body of software? That is the key touch-point between techies and business people. This is the point at which the level of abstraction of the application comes into play. Regardless of the level of abstraction of the application, the “change” required either has been anticipated and provided for or it has not.

    • If the change has been anticipated, the code can already do the kind of thing the user wants – but not the particular thing. Doing the particular thing requires that a parameter be defined, a configuration file changed, a template created or altered, workflow or rule changes made, or something similar that is NOT part of the program’s source code. This means that the requirement can be met quickly, with little chance of error.
    • If the change has not been anticipated, then source code has to be changed in some way to make the change. The changes required may be simple and localized, or complex and extensive – but the source code is changed and a new version of the program is created.

    This is what the level of abstraction of a software application is all about: the more abstracted the application, the more ways it can be changed without altering the source code and making a new version of the program.

    This is the fundamental driver of applications towards increasing abstraction: as changes have to be made to the application, at some point the technical people may decide to make the change easier to make in the future, and create the appropriate abstraction for the kind of change. This may happen repeatedly. As more complex changes are required, the program may just get more complicated and more expensive to make further changes, or more sophisticated abstractions may be introduced.

    While historically it appears that outside forces drive applications towards increasing abstraction, smart programmers can understand the techniques of abstraction and build an application that is appropriately abstract from the beginning. Similarly, the abstracting methods can be applied by smart programmers to existing bodies of code to transform them, just because it's a way to build better code and meet ever-evolving business requirements.

    Conclusion

    The hierarchy of abstraction in software is one of the most important concepts for understanding a specific piece of software, or a group of related software. Over time, software tends to become more abstract because of competitive and business pressures, or because of smart programmers working to make things better. The more abstract a piece of software is, the more likely that it can respond to business and user demands without modifications to the source code itself, i..e., quickly and with low risk.

    The hierarchy of abstraction is certainly a valuable way of understanding the history of software. But it is more valuable as a framework for understanding a given piece of software, and the way to evolve that software to becoming increasingly valuable. It is most valuable to software developers as a framework for understanding software, and helping them to direct their efforts to get the greatest possible business impact with the smallest amount of time and effort.

  • The Fundamentals of Computer Automation

    The principles underlying computer automation are clear and strong. They account for most of the automation we see. They tell us clearly what will happen, why it will happen, and what the benefits will be. What the principles do NOT tell us is who will first apply automation to what process in what sector.

    Understanding the principles lets you predict the rough order of the sequence of automation.

    The principles are extremely simple. Perhaps that's why they're rarely stated and appear not to be taught in schools or understood by practitioners. So much the better for people who want to innovate and win by automating!

    The Principles

    You can probably guess the principles of automation from the definition of the word: "automate" means to do by machine or computer what a human would otherwise do.

    At core, there's really a single simple principle:

    • Replace time that a person would have spent with work done by a computer.

    The core principle is demonstrated by a before-and-after comparison: the principle requires that human time/value doing whatever activity is being automated, net of the investment in the computer hardware and software, is reduced after the addition of the computer and software. Lots of words, same meaning: apply the computer stuff, and there's less total human time/effort.

    Expanding on the replace theme, "replace" means anything that reduces human time to do something:

    • Replace time that a human would have spent with work done by computer; REPLACE!
    • Reduce the amount of time spent by a human; REDUCE!
    • Arrange the human's time more efficiently, eliminating waste; RE-ARRANGE!
    • Decide Who does What work, How and Where they do it; RE-ORGANIZE!

    That's it! 

    This notion of what software was REALLY about struck me quite early in my career. The thought was simple: the ultimate purpose of most of the software I wrote is to replace humans. The thought made me uncomfortable. But after a while, I connected the thought with all the rest of the mechanization and industrialization in society for hundreds of years. A new machine is valuable only to the extent that it somehow reduces the total amount of human labor to reach a given result, everything taken into account! This was true for the Jacquard Looms to which the luddites violently objected in the early 1800's, and the same principles are at work today with computers, resisted by modern-day luddites.

    In other words, computers are the next step in a long sequence of people figuring out how to get stuff done with less effort by themselves and/or other people.

    Here are some simple examples:

    REPLACE

    Replace what humans do with computers doing it is the core principle of automation. E-mail is a super-simple example, since computers automate the entire process of delivering the result of typing to another human being, eliminating the many steps involved in doing it physically. The replacement can be low-level and mechanical, like delivering email, or it can be high-level, like the work of deciding who should do what work in what order.

    REDUCE

    Electric light is a pre-computer example, since it reduces all the time required to, for example, deal with the oil lamp, and replaces it with flicking a switch. A modern example is spreadsheets, which automate the time people used to spend making calculations with adding machines.

    RE-ARRANGE

    Introducing chat into customer call centers eliminated huge amounts of wasted waiting time, so that phone operators could handle chat when they weren't on the phone.

    RE_ORGANIZE

    Routing certain kinds of calls to the optimal customer service person (Who does What work), giving them scripts and hints.

    The Principles apply to Manual and Mental work

    At the start of automation, the computer and software automate work that it's pretty plain that people are doing. As it gets more advanced, a couple interesting things happen.

    1. Once the work is done by computer, it sometimes happens that better ways of getting the work done are possible. This is the same principle we see in mechanical flight: even though all birds have wings that flap, airplane wings don't flap. Instead of flying the ways birds do, engineers have figured out of the core principles of flying and reflected those in airplane design. Similarly, human work is often first translated to computer automation simplistically, and later rebuilt without reference to the human origins.
    2. Once computers are a significant part of the landscape, computer people figure out valuable things to do that have little to do with what humans have done in the past, though this is uncommon.

    The Principles are recursive

    At the beginning, getting computers to do things is incredibly labor-intensive. As time goes on, some people look at the work humans do to get computers to do things — and automate the automation process! Yes, the very programmers whose job is to put people out of work can be put out of work by computers. And on and on.

    The Principles applied to human interaction

    We can see the principles in action in the sequence of changes that have taken place in normal interactions between human beings.

    Prior to automation, the only way humans could communicate was by voice, and by being in each other's physical presence. Here are the major automation steps that have taken place:

    • Writing a letter.
      • Automation is making and acquiring paper, pen and ink, and transporting letter
      • Eliminates the travel time of the sender and/or receiver
    • Publishing books, newspapers.
      • Automation is printing and distributing the books or papers
      • Eliminates the travel time of all the receivers
    • Telephone
      • Eliminates the travel time for the parties to get together
      • Eliminates the time spent writing and reading, and the delay in transmission of prior methods
    • Mobile phone
      • Eliminates the need to travel to a phone to make and/or receive calls
    • Voice mail
      • Eliminates the requirement that the receiver be available when the caller calls, for those cases when it's applicable
    • E-Mail
      • Writing a letter with near-real-time delivery and the ability to read when ready, like letter reading and voice mail.
    • Chat
      • Like email, only with a different interface, making it better for real-time, written exchanges, with minor delays acceptable.
    • Scripting
      • In customer service centers, reduces training, reduces human error, and reduces the time to produce an optimal answer to a question.
    • VRU, chatbot
      • A recorded and/or machine-generated party in a conversation.
      • Applicable to all of the above methods.
      • Eliminates a human giving repetitive responses or messages.

    This all seems obvious, right? Like we know all this stuff, what's new?

    For the steps above that are computer-based, what's interesting is that the technical capabilities were often in place considerably before they were brought into production. The gap was usually not decades-long, like it has been with more esoteric technologies. And in each case, knowing the technology and this fundamental principle could enable you to predict that the automation would be built and deployed.

    Conclusion

    If you want to predict the future of technology and evolution, study the evolution of software and the fundamental principles that drive its application. The principles are obvious! But they are nonetheless rarely discussed or applied.

     

  • The Fundamentals of Speed-optimized Software Development

    How do you write software more quickly? How do you engage and win with Wartime Software? There is a set of techniques that enable dramatic speed-ups in producing effective, high-quality software, and they share a set of underlying principles. These fundamental principles of software development are practically never discussed, yet they are almost painfully simple. Understanding and applying them can help you adapt existing speed-optimized methods to your needs and evolve new ones.

    Mainstream thinking about speed

    Very few groups optimize for speed in software development. By far the most common factor that is optimized is expectations. Given that everyone involved in software knows about, believes in and practices expectation-based development, the options for speeding things up aren't great.

    The most widely used method for speeding things up is simple: pressuring everyone involved to go faster. Do whatever it takes, darn it, we're going to bring this project in on time! What this amounts to is:

    • People work harder, longer hours and extra days. Result: sloppiness, lack of care, increasing number of mistakes and omissions, greater risk.
    • People squeeze or omit essential steps, particularly later in the project. Result: poor quality and even disasters encountered during roll-out.
    • Fashionable new methods are introduced that re-arrange things or add steps. Result: see above.

    The fundamentals of achieving speed

    The fundamental principles of achieving speed in software are extremely simple: take fewer, more effective steps to the goal and eliminate overhead. When framed in all the stuff we know about software development, this can be abstract and difficult to apply in practice. But it's truly simple. Think about walking to a goal. The principles amount to:

    Walk to the destination as directly as possible; avoid stopping and re-tracing your steps.

    That's not so hard, is it?

    I know, it is hard, because standard software development methods are worlds away from thinking about things in these terms. But that's because they're optimized for expectations and not for speed!

    Replacing lots and lots with a little

    Most of the steps in mainstream software development are there for a good reason. While concepts like project management may have been developed outside of software, they have been adapted to it. So if you're going to toss out some standard expectations-based activity, you'd better be sure there is speed-optimized activity that gets the same job done — at least as well, and with lots less time and effort.

    With all the effort and work that's gone into software development methods over the years, there's not a lot of improvement to be made by tweaking this or enhancing that. Radical methods are required to accomplish radical change. So it's important to ignore all the intermediate steps and focus on the only goal that matters: customer satisfaction. Period. Everything else is a stand-in, an intermediary, or supposed to contribute to that end.

    If this is the goal, one of the fundamental principles of Wartime Software, of optimizing for speed and results, is this:

    DO globally optimize for the goal.

    DO NOT sub-optimize for any intermediate goals.

    Once you globally optimize for reaching your goal, then you realize that there is a HUGE amount of redundancy in the normal "rigorous" software development process. The redundancy isn't obvious or even visible because mostly it's many repetitions with slight variations, each confined in a silo, performed by different people in different environments using different tools. So it doesn't look redundant, when examined in isolation. But when viewed globally? MASSIVE redundancy, and an opportunity to replace lots and lots of overlapping stuff with a little bit, done right.

    Conclusion

    Wartime software is not achieved by doing a re-organization or tweaking a process. It's achieved by a total re-conceptualization of the optimal way to get from Point A to Point B in software. Once you understand the fundamental principle of global optimization, with the goal of eliminating overhead and redundancy, all the detailed speed-optimized techniques follow like Q.E.D.

  • Innovations that aren’t: data definitions inside or outside the program

    There are decades-long trends and cycles in software that explain a great deal of what goes on in software innovation – including repeated “innovations” that are actually more like “in-old-vations,” and repeated eruptions of re-cycled “innovations” that are actually retreating to bad old ways of doing things. Of the many examples illustrating these trends and cycles, I’ll focus on one of the most persistent and amusing: the cycle of the relationship between data storage definitions and program definitions. This particular cycle started in the 1950’s, and is still going strong in 2015!

    The relationship between data in programs and data in storage

    Data is found in two basic places in the world of computing: in “persistent storage” (in a database and/or on disk), where it’s mostly “at rest;” and in “memory” (generally in RAM, the same place where the program is, in labelled variables for use in program statements), where it’s “in use.” The world of programming has gone around and around about what is the best relationship between these two things.

    The data-program relationship spectrum

    At one end of the spectrum, there is a single definition that serves both purposes. The programming language defines the data generically, and has commands to manipulate the data, and other commands to get it from storage and put it back. IMG_0002
    At the other end of the spectrum, there is a set of software used to define and manipulate persistent storage (for example a DBMS with its DDL and DML), and a portion of the programming language used for defining “local” variables and collections of variables (called various things, including “objects” and “structures.” At this far end of the spectrum, there are a variety of means used to define the relationship between the two types of data (in-storage and in-program) and how data is moved from one place to the other. IMG_0001

    For convenience, I’m going to call the end of the spectrum in which persistent data and data-in-program are as far as possible from each other the “left” end of the spectrum. The “right” end of the spectrum is the one in which data for both purposes is defined in a unified way.

    People at the left end of the spectrum tend to be passionate about their choice and driven by a sense of ideological purity. People at the right end of the spectrum tend to be practical and results-oriented, focused on delivering end-user results.

    Recent example: Ruby and RAILS

    The object-oriented movement tends, for the most part, to be at the left end of the spectrum. Object-oriented people focus on classes and objects, which are entirely in-program representations of data. The problem of “persisting” those objects is an annoying detail that the imperfections of current computing environments make necessary. They solve this problem by various means; one of the most popular solutions is using an ORM (object-relational mapper), which does the work of moving objects between a DBMS and where they “should” be nearly automatic.  It also can automate the process of creating effective but really ugly schemas for the data in the DBMS.

    A nice new object-oriented language, Ruby, appeared in 1995. It was invented to bring a level of O-O purity to shell scripting that alternatives like PERL lacked. About 10 years later, a guy was working in Ruby for Basecamp and realized that the framework he'd created for it was really valuable: it enabled him to build and modify things his company needed really quickly. He released it to open source and became known as the RAILS framework for Ruby, the result being Ruby on RAILS. Ruby on RAILS was quickly adopted by results-oriented developers, and is the tool used by more than half a million web sites. While "pure" Ruby resides firmly at the left end of the spectrum, the main characteristic of the RAILS framework is that you define variables that serve for both program access and storage, putting it near the right end of the spectrum.

    Rhetoric of the ends of the spectrum

    Left-end believers tend to think of people at the other end as rank amateurs. Left-enders tend to claim the cloak of computer science for their style of work, and claim that they're serious professionals. They stress the importance of having separation of concerns and layers in software.

    Right-end practitioners tend to think of people at the other end as purists, ivory-tower idealists who put theory ahead of practice, and who put abstract concerns ahead of practical results. They stress the importance of achieving business results with high productivity and quick turn-around.

    It goes in cycles!

    There have always been examples of programming at both ends of the spectrum; neither end disappears. But what happens is that the pendulum tends to swing from one end to the other, without end up to now.

    In 1958, Algol was invented by a committee of computer scientists. Algol 60 was purely algorithmic — it only dealt with data in memory, and didn't concern itself with getting data or putting it back. Its backers liked its "syntactic and semantic purity." It was clearly left-end oriented. But then practical business people, in part building on earlier efforts and in part inspired by the need to deliver results, agreed on COBOL 60, which fully incorporated data definitions that were used both for interacting with storage and for performing operations. It was clearly right-end oriented. All the big computer manufacturers, anxious to sell machines to business users, built COBOL compilers. Meanwhile, Algol became the choice for expressing algorithms in academic journals and text books.

    There was a explosion of interest in right-end languages during the heyday of minicomputers, with languages like PICK growing in use, and niche languages like MUMPS having their day. The rise of the DBMS presented new problems and opportunities. After a period of purism, in which programmers struggled with using left-end languages that were supposed to be "easy" like BASIC to get jobs done, along came Powersoft's Powerbuilder, which dominated client-server computing because of the productivity win of integrating the DBMS schema with the language. The pendulum swung back with the rise of the internet and the emergence of "pure" java, with its arms-length relationship to persistence, putting it firmly in the left-end camp. Since Java wasn't pure enough for some people, "pure" Ruby got out there and grows — and some Ruby user is under pressure to deliver results and invents RAILS, which then spreads to similarly-minded programmers. But "amateurs" who like Java but also like productivity invented and are spreading Grails, a derivative of Java and RAILS that combines advantages of each.

    So is RAILS an innovation? Java? I don't think so. They're just minor variants of a recurring pattern in programming language design, fueled by the intellectual and emotional habits of groups of programmers, some of whom push towards the left end of the spectrum (with disdain for the amateurs) and others of whom push towards the right end of the spectrum (with disdain for the purists).

    Conclusion

    This spectrum of the relation between data and program has persisted for more than 50 years, and shows no signs of weakening. Every person or group who, finding themselves in an environment which emphasizes the end of the spectrum they find distasteful, tries to get to the end they like. Sometimes they write a bunch of code, and everyone thinks they're innovative. Except it would be more accurate to say they're "inno-old-ive." This is one of the many patterns to be found in software history, patterns that account for so much of the detail that is otherwise hard to explain.

  • Fundamental Innovations in Software

    As a result of the decades I have spent working on, in and around computers, I have learned many things from other people, from books and talks, from studying the results of other peoples' work, from trying to accomplish many things in various ways myself, and from following the course of many projects and products over time. During this period of time, the computer industry has changed dramatically in many ways, and not much in some ways.

    Most of the knowledge and insight I have gained from this effort over time match well with those of the industry as a whole. However, there are major subject areas which I have observed don't get much attention, or need major innovation. Here are some of those areas.

    Quality

    A good deal of attention has been paid to the quality that results from software development efforts. Products have been built to automate various aspects of the quality process, and there are techniques frequently incorporated into the software development process intended to assure good quality results.

    However, it is clear that there is a tremendous opportunity to enhance the quality process. There are conceptual and technical advances that can be applied to the quality process that greatly improve the results of software development and reduce the time and effort to attain those results. While it is likely that there are situations to which the optimal techniques do not apply in part or in full, it appears that they are applicable to most software development projects.

    Optimal results

    There are a few areas in computers where people focus on measures of goodness and generally agree on what those measures are, for example, total cost of ownership. But the concept of the best possible result in theory, comparable to Shannon’s result in communications, is rarely applied in computing. Yet, there are a number of areas where it is applicable and useful.

    Similarly, in computer hardware, people frequently reach a consensus concerning the “best” way to implement a certain feature, whereas in software development tools and processes, the thoughts about the optimal way of doing things evolve slowly, but rarely reach resolution. Moving beyond advocacy and thinking about what is truly optimal and how to attain it is very fruitful.

    History

    Software development is a field that pays remarkably little attention to history; everything is now and the next best thing. But in fact, a study of history in this field is very rewarding, because just like in real history, you find that some thing truly change, some of them extremely slowly and a few rapidly; and that other things go through recurring cycles. Knowledge of this history is interesting in and of itself, just like “real” history, and also enables you to predict the future within reasonable limits by extrapolating the patterns.

    Application and systems software

    If you look at every line of code that is executed in order to run a program, the lines fall into various categories, including systems software, standard libraries and applications. The “line” between these has been moving “up” very slowly over the last few decades. This glacial trend has impacts on operating systems, databases, application development tools, and related subjects. Understanding and exploiting this trend is a target-rich environment for innovation.

    Abstraction levels

    When we notice things repeating in computing, we build a level of abstraction to encapsulate the repetition and then work at the level of abstraction. Each abstraction is something that has to be built, adopted and learned, and because of these obstacles (and in spite of the benefits), abstractions propagate slowly. Some are so hard for most people, like those involving real math, that they can only be used by hiding the complications from practically everyone. Exploiting abstractions can lead to huge advances.

    Closed loop systems

    The concept of running an automation system “open loop” vs. “closed loop” is widely understood. But I find that few computer systems are run closed loop. Even though this is not exactly a novel concept, most people who work on building or operating the systems seem to be unfamiliar with it.

    Workflow systems

    The concept of workflow has been around for many years, and many systems have been built that embody the concepts. However, most people that I encounter seem not to understand the abstraction, and no good tools have appeared to ease the path to implementation.

    Application Building Methods

    The biggest, fattest target of all is the project management style of organizing and managing software projects. I have written a long book dissecting the theory and practice of applying otherwise reasonable project management techniques to software, and another one outlining the alternative approach. The larger the organization, the more likely software is going to be built the bad way. When software is built quickly and well, it is most often built in smaller organizations that are working under some kind of severe constraint. And of course, there are people here and there who simply have figured out better ways of building software, and just do it.

    Understanding the People

    Athletes are special people — they're people like everyone else, but the outstanding ones are different in important ways. To encourage them to do their best, you have to understand those differences and act in appropriate ways. Same thing for software. HR people, general managers and everyone else applies the same template to software people they apply to everyone else. They emphasize the commonality and ignore the differences. This is why, in general, management of software people is inexcusably bad. I've written a book about this.

    Summary

    Each of the subjects mentioned here could be a book; some of them are the basis of whole innovative companies. They're not just theoretical. Exploiting some of these subject areas can lead to rapid tactical execution benefits in organizations that build or use software.

  • Math and Computer Science vs. Software Development

    In a prior post, I demonstrated the close relationship between math and computer science in academia. Many posts in this blog have delved into the pervasive problems of software development. I suggest that there is a fundamental conflict between the perspectives of math and computer science on the one hand, and the needs of effective, high quality software development on the other hand. The more you have computer science, the worse your software is; the more you concentrate on building great software, the more distant you grow from computer science.

    If this is true, it explains a great deal of what we observe in reality. And if true, it defines and/or confirms some clear paths of action in developing software.

    A Math book helped me understand this

    I've always loved math, though math (at least at the higher levels) hasn't always loved me. So I keep poking at it. Recently, I've been going through a truly enjoyable book on math by Alex Bellos.

    Bellos cover

    It's well worth reading for many reasons. But this is the passage that shed light on something I've been struggling with literally for decades.

    Bellos quote

    When we learn to count, we're learning math that's been around for thousands of years. It's the same stuff! Likewise when we learn to add and subtract. And multiply. When we get into geometry, which for most people is in high school, we're catching up to the Greeks of two thousand years ago.

    As Alex says, "Math is the history of math." As he says, kids who are still studying math by the age of 18 have gotten all the way to the 1700's!

    These are not new facts for me. But somehow when he put together the fact that "math does not age" with the observation that in applied science "theories are undergoing continual refinement," it finally clicked for me.

    Computers Evolve faster than anything has ever evolved

    Computers evolve at a rate unlike anything else in human experience, a fact that I've harped on. I keep going back to it because we keep applying methods developed for things that evolve at normal rates (i.e., practically everything else) to software, and are surprised when things don't turn out well. The software methods that highly skilled software engineers use are frequently shockingly out of date, and the methods used for management (like project management) are simply inapplicable. Given this, it's surprising, and a tribute to human persistance and hard work, that software ever works.

    This is what I knew. It's clear, and seems inarguable to me. Even though I'm fully aware that the vast majority of computer professionals simply ignore the observation, it's still inarguable. The old "how fast do you have to run to avoid being eaten by the lion" joke applies to the situation. In the case of software development, all the developers just stroll blithely along, knowing that the lions are going to to eat a fair number of them (i.e., their projects are going to fail), and so they concentrate on distracting management from reality, which usually isn't hard.

    What is now clear to me is the role played by math, computer science and the academic establishment in creating and sustaining this awful state of affairs, in which outright failure and crap software is accepted as the way things are. It's not a conspiracy — no one intends to bring about this result, so far as I know. It's just the inevitable consequence of having wrong concepts.

    Computer Science and Software Development

    There are some aspects of software development which are reasonably studied using methods that are math-like. The great Donald Knuth made a career out of this; it's valuable work, and I admire it. Not only do I support the approach when applicable, I take it myself in some cases, for example with Occamality.

    But in general, most of software development is NOT eternal. You do NOT spend your time learning things that were first developed in the 1950's, and then if you're good get all the way up the 1970's, leaving more advanced software development from the 1980's and on to the really smart people with advanced degrees. It's not like that!

    Yes, there are things that were done in the 1950's that are still done, in principle. We still mostly use "von Neumann architecture" machines. We write code in a language and the machine executes it. There is input and output. No question. It's the stuff "above" that that evolves in order to keep up with the opportunities afforded by Moore's Law, the incredible increase of speed and power.

    In math, the old stuff remains relevant and true. You march through history in your quest to get near the present in math, to work on the unsolved problems and explore unexplored worlds.

    In software development, you get trapped by paradigms and systems that were invented to solve a problem that long since ceased being a problem. You think in terms and with concepts that are obsolete. In order to bring order to the chaos, you import methods that are proven in a variety of other disciplines, but which wreck havoc in software development.

    People from a computer science background tend to have this disease even worse than the average software developer. Their math-computer-science background taught them the "eternal truth" way of thinking about computers, rather than the "forget the past, what is the best thing to do NOW" way of thinking about computers. Guess which group focusses most on getting results? Guess which group would rather do things the "right" way than deliver high quality software quickly, whatever it takes?

    Computer Science vs. Software Development

    The math view of history, which is completely valid and appropriate for math, is that you're always building on the past, standing on the shoulders of giants.

    The software development view of history is that while some general things don't change (pay attention to detail, write clean code, there is code and data, inputs and outputs), many important things do change, and the best results are obtained by figuring out optimal approaches (code, technique, methods) for the current situation.

    When math-CS people pay attention to software, they naturally tend to focus on things that are independent of the details of particular computers. The Turing machine is a great example. It's an abstraction that has helped us understand whether something is "computable." Computability is something that is independent (as it should be) of any one computer. It doesn't change as computers get faster and less expensive. Like the math people, the most prestigious CS people like to "prove" things. Again, Donald Knuth is the poster child. His multi-volume work solidly falls in this tradition, and exemplifies the best that CS brings to software development.

    The CS mind wants to prove stuff, wants to find things that are deeply and eternally true and teach others to apply them.

    The Software Development mind wants to leverage the CS stuff when it can help, but mostly concentrates on the techniques and methods that have been made possible by recent advances in computer capabilities. By concentrating on the newly-possible approaches, the leading-edge software person can beat everyone else using older tools and methods, delivering better software more quickly at lower cost.

    The CS mind tends to ignore ephemeral details like the cost of memory and how much is easily available, because things like that undergo constant change. If you do something that depends on rapidly shifting ground like that, it will soon be irrelevant. True!

    In contrast, the Software Development mind jumps on the new stuff, caring only that it is becoming widespread, and tries to be among the first to leverage the newly-available power.

    The CS mind sits in an ivory tower among like-minded people like math folks, sometimes reading reports from the frontiers, mostly discarding the information as not changing the fundamentals. The vast majority of Software Development people live in the comfortable cities surrounding the ivory towers doing things pretty much the way they always have ("proven techniques!"). Meanwhile, the advanced Software Development people are out there discovering new continents, gold and silver, and bringing back amazing things that are highly valued at home, though not always at first, and often at odds with establishment practices.

    Qualifications

    Yes, I'm exaggerating the contrast between CS and Software Development. Sometimes developers are crappy because they are clueless about simple concepts taught in CS intro classes. Sometimes great CS people are also great developers, and sometimes CS approaches are hugely helpful in understanding development. I'm guilty of this myself! For example, I think the fact that computers evolve with unprecedented speed is itself an "eternal" (at least for now) fact that needs to be understood and applied. I argue strongly that this fact, when applied, changes the way to optimally build software. In fact, that's the argument I'm making now!

    Nonetheless, the contrast between CS-mind and Development-mind exists. I see it in the tendency to stick to practices that are widely used, accepted practices, but are no longer optimal, given the advances in computers. I see it in the background of developers' preferences, attitudes and general approaches.

    Conclusion

    The problem in essence is simple:

    Math people learn the history of math, get to the present, and stand on the shoulders of giants to advance it.

    Good software developers master the tools they've been given, but ignore and discard the detritous of the past, and invent software that exploits today's computer capabilities to solve today's problems.

    Most software developers plod ahead, trying to apply their obsolete tools and methods to problems that are new to them, ignoring the new capabilities that are available to them, all the while convinced that they're being good computer science and math wonks, standing on the shoulders of giants like you're supposed to do.

    The truly outstanding people may take computer science and math courses, but when they get into software development, figure out that a whole new approach is needed. They come to the new approach, and find that it works, it's fun, and they can just blow past everyone else using it. Naturally, these folks don't join big software bureaucracies and do what everyone else does. They somehow find like-minded people and kick butt. They take from computer science in the narrow areas (typically algorithms) where it's useful, but then take an approach that is totally different for the majority of their work.

  • Math and Computer Science in Academia

    Math and music are incredibly inter-related, as has been understood at least since Pythagoras. But they are never studied in a single academic department. Math and music are arguably more intimately bound than math and computer science. But math and music are never in the same department, while math and computer science frequently are. Hmmm….

    Math and Computer Science are joined at the hip in Academia

    Math and Computer Science are so intimately related in academia that they are frequenty part of the same department. This is true at elite institutions like Cal Tech. Mathcs caltech

    Math and Computer Science are in the same department at private liberal arts schools, too, like Wesleyan. Mathcs wesleyan

    They're a single department at major state universities, like Rutgers. Mathcs rutgers

    Same thing as lesser state schools. Here's how it goes at Cal State East Bay. Mathcs CSEB

    I make no argument that this is universal. Don't need to. If you search like I did, you'll find that putting math and computer science in a single department is a common practice.

    Why are Math and Computer Science so Academically Intimate?

    Most people seem to think that math and computer science are pretty much the same thing. Consider this:

    • Most "normal" people who try either of them don't get very far.
    • The people who are way into either of them are really nerdy.
    • If you're good at one of them, there's a good chance you'll do well at the other.
    • They are incredibly detail-oriented. They're full of symbols and strange languages.
    • What you do doesn't seem to be physical at all. What are you doing while programming or doing math? Mostly staring into space or scribbling strange symbols, it seems.
    • You can write programs that do math, and math applies broadly to computing.

    Meanwhile, there are other remarkably similar things that don't end up in the same department. Consider the "life sciences." They all have loads of things in common. Everything they all study starts life, develops, lives for awhile, maybe has offspring, and dies. DNA is intimately involved. Oxygen and carbon dioxide play crucial roles. But since when have you ever seen a department of botany and zoology? Like never, right? In the humanities it's just as extreme. Ever hear of a department of French and German? Academics already fight enough among themselves without that…

    Academics clearly think that math and computer science aren't just similar or highly related. If so, they'd treat them the way they do languages or life sciences. A broad spectrum of academics think they're so interwoven that there are compelling reasons for studying them together. Thus a single department that has them both.

    Math and Computer Science, a Marriage made in ????

    It's a common practice for math and computer science to be studied together. Obviously, most people have no trouble with the concept. Of all the things to question or worry about in the world, this seems pretty low on the list.

    I would like to change this. I'd like to cause trouble where there is none today — or rather, I'd like to EXPOSE the deep-seated, far-reaching, trouble-causing consequences of the fact that everyone thinks it's quite alright that math and computer science are thought of as pretty much two halves of the same coin. In fact, I will argue that the math-computer-science-marriage is just fine for math — but the root cause of a remarkable variety of intractable problems that plague software development.

    Note that I did a quick shift there. I have no problem with math and computer science being together. They kinda belong together. My problem is that everyone thinks that you study computer science in school so that you're qualified to do software development after graduating. And that software development shops require CS degrees, and pay more for advanced degrees in CS, on the theory that if some is good, more must be better.

    I will flesh this out and explain why it's the case in future posts. But I thought throwing down the gauntlet was worth doing. Or at least fun!

  • Fundamental Concepts of Computing: Software is Data!

    The fact that what computers operate on (data) and the instructions for how to operate on it (software) are the same kind of stuff (i.e., data) is obvious, simple, profound, not well-understood, and has huge implications. The relatively small number of programmers who take advantage of the fact that software is data are levels and levels better and beyond "normal" programmers. The fact that software is data is well qualified to be a fundamental concept of computing, along with counting, closed loop and a few other things.

    Software is Data

    All software is data; some data is software. US Letter Blank_3
    Everyone who sorta knows what programming is knows this, the same way that everyone knows that air is lighter than water. But who thinks about it? Who cares? What difference does it make?

    First let me spell out how software is data. You've got a bunch of files on your computer. Some of them may be text files or spreadsheets (data). Some of them may be the kind of text that will make sense when opened with a programmer's editor (still data); you edit it and save it back (data all the way). Then you run the compiler. It takes as input the text representation of the program (data) and writes out a new file, which is an "executable" file, still data. Then you "run" the program, i.e., a program loads the executable file data into memory and then, by one means or another, gets the machine's instruction address pointer to point to the first byte of the file in memory. At which point, the program is "executed."

    The file was data when it was on disk, regardless of its format. It was data when it was loaded into memory, not much different than a text file loaded by a text editor. It was data when the processor started executing instructions, and it was data once the program ceased being executed. It was data before, during and after execution. It just happened to be (hopefully) data in the format of sensible instructions the machine knows how to execute. Even if it's crap, the machine will do its best to execute it until it somehow loops or crashes out. Then you get your machine back. The point is: the machine doesn't know the difference!

    To a computer, everything is data. If we set the instruction address pointer to data that happens to be nicely formatted instructions, good things will happen. But it's still just data.

    Software wasn't always data!

    It's an amazing, unprecedented leap forward to make the control function of a machine out of the same stuff, stored in the same way and processed in the same way, as the stuff the machine works on. No other machine is like this! Because of this unique facet of computers, most people tend to act the way they usually do, just as they do for the unprecedented speed of evolution of computers.

    Any machine you can think of has the control part and the "business" part, where the machine does what it does, according to the control. This is true for a lawn tractor, 1968 05 Mem day Bobs 43-22
    and it's true for a vehicle. It's true for the calculating machines of the late 1940's such as the IBM 402 (here's the plug board where the "program" was entered). 800px-IBM402plugboard.Shrigley.wireside
    It's true for that famous early computer, the ENIAC; here's the plug board, sadly not replaceable, Two_women_operating_ENIAC
    with a couple of the ladies who programmed it. It's even true for the much-lauded Turing machine, whose famous endless tape contains only data, while the control is somewhere else.

    It was a true great leap forward, breaking with the strict separation of control and action that exists everywhere in human experience, to use some of a computer's data for control purposes, and the rest for data that isn't software. This is called a von Neumann architecture, or a "stored-program" computer.

    Because software is data, a program can act on itself, since "itself" is data; i.e., a program can modify itself. This characteristic is unique among machines — only stored-program computers can do this. Sound familiar?

    Interpreters and other levels

    The first software was the binary data that the machine recognized as instructions. Next step was a more readable, text version of the machine language: assembler language. They key thing with assembler language is that each line of assembler language translates to exactly one line of machine language. Next comes compiled languages like FORTRAN and COBOL, which compilers turns into machine language, typically with multiple machine instructions for each line of language. Next comes interpreted languages, which are "executed" by an interpreter program; instead of generating instructions, the interpreter just does what's intended by the program right away.

    From this we gather that programs can take as input programs (in some format and language), and either execute them or generate other programs as output, either literally executed machine instructions or some other form of language.

    Consequences of software being data

    This is a BIG subject. For starters, though, doesn't it make sense that when a machine is "self-operating," as computers are, and no other machine in our experience is, that effectively utilizing the self-referential power of the machine would lead to interesting things? It certainly has for humans!

    Let's take the classic issue of customization. Once a programming environment has been chosen, the tendency is to model your problem in terms of the programming environment, and code away. For example, there's the whole field of object-oriented analysis and design, in which you're supposed to use this style of thinking to put everything in terms of, then you proceed to build your classes and away you go. Life is great. But now something has to get changed. This requires that you examine the entire body of code, make the changes, test, migrate data as needed, etc. And again and again.

    Eventually, you might realize that certain classes of changes are often required. If you really get that software is data, you will realize that you could have modeled the entire application in the simplest possible terms and built an interpreter. Changes then fall into one of two categories: the new thing is a variant on the kinds of things the interpreter already does, in which case you just change the model; or the new thing is a new kind of thing, in which case you extend the interpreter and use its new capability in the extended model.

    This is just one example. You can have mixed models, you can generate code, you can mix in classic parameters, etc.

    The point is that, if you are a software-is-data-aware person, you aren't "stuck" in any programming model or environment. Moreover, you are more likely to come up with effective and efficient approaches, which can easily be orders of magnitude better than naive, single-level ones.

    Conclusion

    "Software is data" should be one of the most obvious statements you can make in the computer world. It's like going to the New York Yankees and declaring that one of the most important things about baseball is that bats are involved. Everyone would avert their eyes and you would be quietly shown the door. Yet I find that the norm in software groups is to demonstrate no awareness through their actions that software is data. As far as you can tell by their actions, software is a whole separate thing. To them, it's as though software were like every other kind of machine control panel everyone encounters in their normal lives. They don't seem to even consider programming paths that involve software that modifies or interprets software. As a result, they are vulnerable to massive embarassment and humiliating defeat by groups that take advantage of this fundamental concept of computing.

  • How to Evaluate Programming Languages

    Programmers say language A is "better" than language B. Or, to avoid giving offense, they'll say they "like" language A. Sometimes they get passionate, and get their colleagues to program in their new favorite language.

    This doesn't happen often; inertia usually wins. When a change is made, it's usually passion and energy that win the day. If one person cares a WHOLE LOT, and everyone else is, "whatever," then the new language happens.

    Sometimes a know-nothing manager (sorry, I'm repeating myself) comes along and asks the justification for the change. The leading argument is normally "the new language is better." In response to the obvious "how is it better?" people try to get away with "It's just better!" If the manager hangs tough and demands rationality, the passionate programmer may lose his cool and insist that the new language is "more productive." This of course is dangerous, because the rational manager (or have I just defined the empty set?) should reply, "OK, I'll measure the results and we'll see." Mr. Passion has now gotten his way, but has screwed over everyone. But it usually doesn't matter — who measures "programmer productivity" anyway?

    But seriously, how should we measure degrees of goodness in programming languages? If there's a common set of yardsticks, I haven't encountered them yet.

    The Ability of the Programmer

    First, let's handle the most obvious issue: the skill of the programmer. Edgar Allan Poe had a really primitive writing tool: pen (not even ball-point!) and paper. But he still managed to write circles around millions of would-be writers equipped with the best word processing programs that technology has to offer. I've dealt with this issue in detail before, so let's accept the qualification "assuming the programmer is equally skilled and experienced in all cases."

    Dimensions of Goodness

    Programmers confined to one region of programming (i.e., most programmers) don't often encounter this, but there are multiple dimensions of goodness, and they apply quite differently to different programming demands.

    Suppose you're a big fan of object-orientation. Now it's time to write a device driver. Will you judge the goodness of the driver based on the extent to which it's object-oriented? Only if you're totally blindered and stupid. In a driver you want high performance and effective handling of exception conditions. Period. That is the most important dimension of goodness. Of all the programs that meet that condition, you could then rank them on other dimensions, for example readability of the code.

    Given that, what's the best language for writing the driver? Our fan of object orientation may love the fact that his favorite language can't do pointer arithmetic and is grudgingly willing to admit that, yes, garbage collection does happen, but with today's fast processors, who cares?

    Sorry, dude, you're importing application-centric thinking into the world of systems software. Doesn't work, terrible idea.

    It isn't just drivers. There is a universe of programs for which the main dimensions of goodness are the efficiency of resource utilization. For example, sort algorithms are valued on both performance and space utilization (less is better). There is a whole wonderful universe of work devoted to this subject, and Knuth is near the center of that universe.ArtOfComputerProgramming

    Consistent with that thinking, Knuth made up his own assembler language, and wrote programs in it to illustrate his algorithms. Knuth clearly felt that minimum space utilization and maximum performance were the primary dimensions of goodness.

    The largest Dimension of Goodness

    While there are other important special cases, the dimension of goodness most frequently relevant is hard to state simply, but is simple common sense:

    The easier it is to create and modify programs, quickly and accurately, the more goodness there is.

    • Create. Doesn't happen much, but still important.
    • Modify. The most frequent and important act by far.
    • Quickly. All other things being equal, less effort and fewer steps is better.
    • Accurately. Anything that tempts us to error is bad, anything that helps us to accuracy is good.

    That, I propose, is (in most cases) the most important measure of goodness we should use.

    Theoretical Answer

    Someday I'll get around to publishing my book on Occamality, which asks and answers the question "how good is a computer program?" Until then, here is a super-short summary: among all equally accurate expressions of a given operational requirement, the best one has exactly one place where any given semanic entity is expressed, so that for any given thing you want to change, you need only go to one place to accomplish the change. In other words, the least redundancy. What Shannon's Law is for communications channels, Occamality is for programs. Given the same logic expressed in different languages, the language that enables the least redundancy is the best, the most Occamal.

    Historical Answer

    By studying a wide variety of programs and programming languages in many fields over many years, it's possible to discern a trend. There is a slow but clear trend towards Occamality, which demonstrates what it is and how it's best expressed. The trend is the result of simple pressures of time and money.

    You write a program. Sombody comes along and wants something different, so you change the program. Other people come along and want changes. You get tired of modifying the program, you see that the changes they want aren't to different, so you create parameters that people can change to their heart's content. The parameters grow and multiply, until you've got loads of them, but at least people feel like they're in control and aren't bugging you for changes. Parameters rule.

    Then someone wants some real action-like thing just their way. You throw up your hands, give them the source code, and tell them to have fun. Maybe they succeed, maybe not. But eventually, they want your enhancements in their special crappy version of your nice program, and it's a pain. It happens again. You get sick of it, analyze the places they wanted the changes, and make official "user exits." Now they can kill themselves customizing, and it all happens outside your code. Phew.

    Things keep evolving, and the use of user exits explode. Idiots keep writing them so that the whole darn system crashes, or screws up the data. At least with parameters, nothing really awful can happen. The light bulb comes on. What if I could have something like parameters (it's data, it can't crash) that could do anything anyone's wanted to do with a user exit? In other words, what if everything my application could do was expressed in really powerful, declarative parameters? Hmmm. The users would be out of my hair for-like-ever.

    What I've just described is how a new programming "level" emerges historically. This is what led to operating systems, except that applications are one giant user exit. UNIX is chock full of things like this. This history is the history of SQL in a nutshell — a powerful, does-everything system at the level of declarative, user-exit-like parameters!

    The Answer

    In general, the more Occamal the language (and its use), the better it is. More specifically, given a set of languages, the best one has

    • semantics that are close to the problem domain
    • features that let you eliminate redundancy
    • a declarative approach (rather than an imperative one)

    Let's go through each of these.

    Problem Domain Semantics

    A great example of a such a language is the unix utility AWK. It's a language whose purpose is to parse and process strings. Period. You want an accounting system, don't use AWK. You want to generate cute-looking web pages, don't use AWK. But if you've got a stream of text that needs processing, AWK is your friend.

    From the enterprise space, ABAP is an interesting example. While it's now the prime language for writing SAP applications, ABAP was originally Allgemeiner Berichts-Aufbereitungs-Prozessor, German for "general report creation processor." In other words, they saw the endless varieties of reporting and created a language for it; then having seen the vast utility of putting customization power in the hands of users, generalized it.

    Features that let you eliminate redundancy

    This is what subroutines, classes and inheritance are supposed to be all about. And they can help. But more often, creating another program "level" is the most compelling solution, i.e., writing everything that the program might want to do in common, efficient ways, and having the application-specific "language" just select and arrange the base capabilities. This is old news. Most primitively, it's a subroutine library, something that's been around since FORTRAN days. But there's an important, incredibly powerful trick here. In FORTRAN (and in pretty much all classically-organized subroutine libraries), the library sits around passively waiting to be called by the statements in the language. In a domain-specific language, it's the other way round!

    A related approach, which has been implemented over and over, is based on noticing that languages are usually designed independent of databases, but frequently used together. The result is monster redundancy! Wouldn't it be a hoot if database access were somehow an integral part of the language! Well, that explains how and why ABAP evolved from a reporting lanaguage (necessarily intimate with the database) into "Advanced Business Application Programming." And it explains why Ruby, when combined with the database-leveraging RAILS framework, is so popular and highly productive.

    A declarative approach

    In the world of programming, as in life, the world divides between imperative (commands you, tells you how to do something, gives you directions for getting to point B) and declarative (tells you what should be done, identifies point B as the goal). In short, "what" is declarative and "how" is imperative. A core reason for the incredible success of SQL is that it is declarative, as Chris Date has described in minute detail. Declarative also tends to be less redundant and more concise. As well as doesn't crash.

    Conclusion

    I'm sorry if you're disappointed I didn't name Erlang or whatever your favorite is as the "best programming language," but I insist that it's far more novel and useful to decide on what basis we are to judge "goodness" in programming languages, and in programs. In general and in most domains (with important exceptions, like inside operating systems), non-redundancy is the reigning virtue, and languages that enable it are superior to ones that are not. Non-redundancy is nearly always best achieved with a declarative, problem-domain-centric approach. And it further achieves the common-sense goal of fast, accurate creation and modification of programs.

    Occamality is rarely explicitly valued by programmers, but the trend to it is easy to see. There are widespread examples of building domain-specific languages, meta-data and other aspects of Occamlity. Many programmers already act as though Occamality were the primary dimension of goodness — may they increase in numbers and influence! 

  • Fundamental Concepts of Computing: Speed of Evolution

    Nothing we encounter in our daily lives changes or evolves as quickly as computing. All our habits of thinking are geared towards things that evolve slowly, compared to computing. This is a simple concept. It is disputed by no one. But it has implications that are vast and largely undiscussed and unexplored. It clearly deserves to be a fundamental concept of computing, along with a few correlaries.

    Normal evolution speeds

    Our planet evolves. Most of the changes are slow, with peak events mixed in. Life forms evolve slowly, over tens of thousands of years. Human culture evolves more quickly — too fast for some people, and not nearly fast enough for others. Human capabilities also evolve, but for something like speed to double over a period of decades is astounding.

    Take people running as an example. Here are the records for running a mile over the last 150 years or so. Mens record
    The time has been reduced by roughly 25% over all those years. Impressive for the people involved, and amazingly fast for human change.

    Once you shift to things made by humans, the rate increases, particularly as science and technology have kicked in. We've invented cars and they've gotten much faster since the early ones. Here is a lady on a race car in 1908. She was called the "fastest lady on earth" for driving at 97 mph in 1906.

    Miss_Dorothy_Levitt,_in_a_26hp_Napier,_Brooklands,_1908

    in 2013, Danica Patrick won pole position during qualifying rounds at the Daytona Speedway by averaging over 196 m.p.h.

    Danica

    In this case, speed roughly doubled over about a hundred years.

    Computer evolution speeds

    Computers are different. Moore's Law is widely known: power doubles roughly every 18 months. And then doubles again. And again. And again. Every time someone predicts an end to the doubling, someone else figures out a way to keep it going. This is a fact, and it's no secret. It's behind the fact that my cell phone has vastly more computational power and storage than the room-sized computer I learned on in 1966.

    This chart should blow anyone's mind, even if you've seen it before. It shows processor transistor count (roughly correlates with power) increasing by a factor of 10, then another, then another. In sum, it shows that power has increased about one million times over the last forty years.

    Processor speed

    It would be mind-blowing enough that we have something in our lives that increases in speed at such an incomprehensible rate. But that's not all! Everything about computers has also gotten less expensive! For example, the following chart shows how DRAM storage prices have gone down over the last twenty years, from over $50,000 per GB to around $10. In other words, to about 2% of 1% of the price twenty years ago.

    DRAM-GB-Price

    That's faster and cheaper in a nutshell.

    So what? Everyone knows that things change quickly with computers, what's the big deal? It's just the way things are!

    Here's what: this simple fact has profound implications.

    Everything about human beings is geared for people and things that evolve at "normal" speed. Our patterns of thought, the things we do, most of our behaviors were developed for "normal-speed" evolvers, i.e., everything but computers (EBC). A surprising number of them break, are wrong or yield crappy results when applied to computers. There is no reason at all to be surprised by this; in fact, anything else would be surprising. What is surprising and interesting is that the implications are rarely discussed.

    Computers evolve more quickly: it matters!

    When you apply patterns of thought and behavior that may be appropriate for EBC to fast-evolving computers, those thoughts and behaviors typically fail miserably. This "impedance mis-match" explains failure patterns that persist for decades. Of course everyone knows that computers are different from other things — just as they know that Indian food is different from Chinese food. What they tend not to know is that computers are different in a different way (because of the speed of evolution) from EBC.

    Here are a couple examples.

    The mainstream "wisdom" for software project management is essentially the same, with minor modifications, as managing anything else, from building a house to a new brand of toothpaste. It's not! That's one of the many reasons why the larger organizations that depend on those techniques fail to build software effectively.

    We treat software programming skills like any other kind of specialized knowledge, like labor law. They're not! Thinking that they are is one of the many reasons why great software people are 10 times better than really good ones, who are themselves 10 times better than average ones.

    The normal ways people go about hiring software engineers is crap. They think that hiring folks who deal with a subject matter that changes so dramatically is the same as hiring anyone else. It's not! They also think that extended, in-depth experience with a particular set of technologies is really valuable. It's not! It's actually a detriment!

    Software engineers tend to learn to program in a certain way, using a given set of tools, techniques and thought patterns. Those tools were designed to solve the set of problems that existed with computers at a certain point in their evolution. But computers evolved from that point! Quickly! The programmers are doing the equivalent of hunting for rabbits with weapons designed for hunting Mastodons, blissfully unaware of what's appropriate for computers the way they are today!

    Software is hard and isn't getting easier

    Software isn't a problem that gets solved — oh, now we know how to do it, finally. It doesn't get solved because the underlying reality (computers) evolves more quickly than anything else. The examples I mentioned are things that "everyone" is sure must apply to software. How could they not? The fact that they yield consistently horrible results seems not to break the widespread faith in the mainstream approach to software. 

    These and many other broadly accepted falsehoods explain why so many things about software are broken and (worse) don't seem to get fixed, decade after decade, when superior methods have been proven in practical application. Why is everyone so resistant to change? Is everyone stupid?

    Aw shucks, I admit it: "everyone is stupid" was my working hypothesis for explaining things like this. But I now have a more satisfying hypothesis: everyone is so used to dealing with "normal speed things," EBC's, that they just can't help themselves from applying to computers the methods and patterns of thought and behaviors that work reasonably well in most of their lives. Since nearly everyone gets the same lousy results, the conclusion everyone draws is that there's something about computers that's just miserable.

    Conclusion

    There is nothing comparable to computers in the rest of our human experience. Nothing evolves at anywhere close to the speed of computers, getting more powerful while getting cheaper at hard-to-comprehend rates. We apply methods and patterns of thought that work well with practically everything, and those methods fail when applied to computers. But they fail for everyone! The conclusion everyone draws from this is that computers are just nasty things, best to stay away from them and avoid blame. It's the wrong conclusion.

    Computers are understandable. The typical failures are completely explained by the mis-matched methods we bring to them, like trying to catch butterflies with a lasso. When people use methods that are adapted to the unprecedented evolutionary speed of computers, things go well.

  • Software Development: the Relationship between Speed and Release Frequency

    There is a deep, fundamental relationship between the velocity of software development and the frequency of releases. I hope this relationship will be studied in detail and everything about it understood, but the basics are clear: with minor qualifications, the more frequently you release your software, the more rapidly it will advance by every relevant measure. It will advance not only in feature/function, but in quality!

    Mainstream thinking on Releases and Development Speed

    The relationship I propose, "more releases = more features & better quality," is counter to the vast majority of mainstream thinking in software. In fact, in those terms, it's counter-intuitive. Here's why.

    Think about software development in the simplest possible terms. You've got define it, plan it, do it, check it and release it. Five basic steps, which apply across a wide variety of process methodologies. Each step takes some time, right? After you do the work, you've got to check it and then release it. And you can't just check what you did — you also have to make sure you didn't break anything that used to work, the "keep it right" part of quality, which grows ever larger as your software evolves.

    This "check and release" process is a kind of necessary evil, the way most people think of it, and as quality failures hit you, it tends to get bigger and longer. A clever project manager (an oxymoron if there ever was one, except when intended ironically, as it is here) will naturally think, gee, let's go from 6 releases a year to 4. By cutting the overhead of the two extra releases, we'll be able to buy some development time back.

    Yup, that really is how people think! Fewer releases = more time to do other stuff = we get more done.

    Not!

    A Real-life Example

    A good example of a company that illustrates the proper relationship between release frequency and development speed is RebelMouse. RebelMouse is a next-generation, socially-fueled publishing platform. It can be used to turn boring-appearing blogs like BlackLiszt from this:


    BL snip

    to this, a snapshot from my RebelMouse page:

    RM page
    v

    Increasingly, they are used by big-media places, for example for Glee:


    Glee

    and the recently released real-time publishing curation features were used for The Following to create a social firestorm:


    The Following

    RebelMouse — the Facts

    The CTO/founder of RebelMouse is Paul Berry. Here he is below explaining something to his fellow nerds at the nerdfest I held a while ago.


    2011 07 02 nerdfest first day 008s

    RebelMouse has grown like crazy in its short life. Currently there are about 280,000 websites powered by RebelMouse, and that number is growing over 100% month-to-month. Their sites have over 2 million unique visitors a month.

    Does RebelMouse have just a handful of releases a year? Duhhh. Try over 10 a day. A day! And there are more than 30 developers, who are not all in the same location.

    Digging in

    There is a lot to be said on this subject. For now, I'm just going to keep it to a single simple but important observation.

    The relationship between development speed and frequency of releases does not hold up at a fine-grained level; so, for
    example, given two organizations, one of which has a release every 10
    weeks and the other every 11 weeks, any difference in speed will be
    random. Similarly, if the two organizations release 5 times a day and 10
    times a day, any difference in speed will also be random. But at a
    coarse-grained level, I observe large differences. HUGE differences.

    Conclusion

    RebelMouse is far from the only example, but they show the relationship between development speed and release fequency very nicely. They move much more quickly than most development organizations of their size — in fact, they manage to push hundreds of releases in the time most organizations would have been able to limp through an "agile" (heh) development cycle or two.

     

  • Software Postulate: the Measure of Success

    There is a Postulate of software development that, like all postulates, has huge impact on much of what goes on in software. This postulate concerns what is the measure of success in software. There are many ways to formulate it, but at heart, it's simple: success is measured by meeting expectations that have been set. Just as getting to modern physics requires changing the parallel postulate in geometry, so does getting to modern software require changing the "meet expectations" postulate in software development.

    Expectations in Software

    Two typical CIO's are talking with each other. They get together to share experiences because, while they don't compete with each other, their groups manage technologies of similar size and complexity.

    CIO A is real happy today. "I guess they finally listened last time. They had been late once too often. I dished out a pretty blistering speech about how awful it was, and I added on a couple of threats about what was going to happen to careers if it happened again. We just had our project review meeting, and for the first time in memory, most items were green, with just a smattering of yellows and a couple reds. I breathed such a sigh of relief."

    CIO B isn't so happy. "That's where I was a couple months ago. Why can't these guys just keep it up? What is it with programmers? We were mostly green, but now green projects are a fading memory. Mostly we're in the yellow and red. Yuck."

    What are these guys talking about? Project Management. They're talking about whether the expectations set by their staff have been met (green) or not (yellow and red).

    The green, yellow and red are at the end of a road that starts with requirements, moves on to the crucial, notoriously difficult art of estimation, and then proceeds to implementation. Green says that the estimates are being met, and yellow and red say, well, maybe not.

    Introducing Absolute Measurement in Software

    The CIO's get over their griping and start to compare notes on some recent projects. As it turns out, they're both building a Data Warehouse. They're in the same industry, and the projects are similar in nearly every way, at least from the outside. Common sense tells them that the internals of their projects should be pretty similar. So they compare notes.

    CIO A (the happy one): "My project sounds about the same as yours. It's such a relief that we're on track. We've got a lean team of just 10 working on it (at one point I thought it might take 20 people), and we're just 6 months from the end of the 18 month project."

    CIO B (the unhappy one): "What? I've got 2 people working on what sounds like the same project as yours. I'm 4 months into the project, and instead of finishing in 2 more months, they're telling me is going to stretch out 2 more weeks, a 25% overrun of the remaining time, which is why I'm so annoyed."

    CIO A: "Are you kidding me? You've got just 20% of the staff with a target of 1/3 of my timeline, and you're mad? Your whole project is a rounding error compared to mine."

    CIO B: "I guess my guys are doing OK after all. I just wish they could set expectations better."

    On this rare occasion, the software managers confronted the reality of absolute measurement in software — but only by chance, and only by comparing two projects to each other. They're not really even approaching absolute measurement — if their two projects had been run equally incompetently, there would have been no surprise!

    What is the Measure of Success?

    What this dialog reveals is the near-universality of measuring success by comparing results to expectations. The CIO who was mad was spending very little money and getting results in a fraction of the time of his compatriot, while the CIO who was happy was doling out the money like confetti and taking the slow boat — but since his team had started by giving him even worse estimates, he thought he was doing great.

    Neither CIO was measuring the success of the projects the way most of us measure. They started with estimates. If the work was coming in better than the estimate, it was judged successful; if the work was delivered worse than the estimate, it was not successful.

    This is the way it works in software. It's an unspoken assumption, a postulate that underlies nearly everything that is done in software. People don't propose alternatives, just as no one proposed an alternative to the parallel postulate in geometry for more than a thousand years. It's considered the one and only way to do things.

    There are other measures of success

    Groups that do an outstanding job of producing software often achieve it with a different measure of success. Just as you can optimize your work for setting expectations and meeting them, you can optimize your work to achieve maximum velocity. Estimates are less important than maximum speed. For example, there's the well-known answer to the question of how to avoid being eaten by a hungry tiger. It has nothing to do with expectations. It's simple: run faster than the other guys.

    Conclusion

    This is a point of deep theoretical interest, and also great practical application. It's related to building bridges in war and peace. If you're under no time or budget pressure, then maybe the meet-expectations assumption is the way you should measure your software efforts. But if you are under competitive pressure, then you might want to think about organizing your software efforts according to a different measure of success: the velocity method.

  • Postulates of Software Development

    A great deal of what we do in software is a direct consequence of a couple of fundamental assumptions we make: postulates of software development. Only by questioning and changing those assumptions can we bring about fundamental change in the way we build software.

    Postulates or axioms are rarely discussed or thought about. We just accept them, like breathing air or walking on the ground. Changing a postulate or assumption normally results in a cascade of consequences that changes a great deal.

    Geometry: The Parallel Postulate

    We can understand postulates in software by seeing how they work in geometry. In Euclidean geometry, there are four fundamental postulates, and a pivotal fifth one, the parallel postulate. This is the one that says, basically, if the angle between two lines isn't exactly 180 degrees, the lines will eventually cross; otherwise, they are parallel, and never meet.


    Parallel

    What's important about this postulate (and the others) is that all the rest of Euclidean geometry is derived from them. Given the postulates, all the theorems are implied.

    For example, the famous Pythagorean Theorem is one of the many theorems whose truth grows out of the small seeds of the postulates.


    Pythag

    In the diagram above, the theorem states that a2 + b2 = c2.

    Non-Euclidean Geometries

    What if parallel lines can meet? Think it's impossible? Well, think about lines on a globe.


    729px-Triangles_(spherical_geometry)

    Lines that are parallel end up meeting — and this is business as usual in Elliptic Geometry. What's worse, the Pythagorean Theorem does not hold in non-Euclidean geometries in general, and spherical geometry in particular.

    This isn't just textbook stuff. For example, Einstein's General Theory of Relativity is based on non-Euclidean geometry. In fact, questioning the Parallel Postulate and devising ways of thinking about and describing non-Euclidean spaces was essential to the development of modern physics. So long as geometry was Euclidean and only Euclidean, progress was impossible.

    The Postulates of Software

    So what are the software equivalents of the Euclidean Postulates? There are few questions that are more important, because only when the foundation is questioned and changed is rational, constructive, internally consistant change possible. Only with new postulates can we derive a whole new set of theorems to define software practice. Only then is fundamental change and improvement possible.

     

Links

Recent Posts

Categories