The Black Liszt

Category: Software Programming Languages

Software Programming Language Evolution: Beyond 3GL’s

In prior posts I’ve given an overview of the advances in programming languages, described in detail the major advances and defined just what is meant by “high” in the phrase high-level language. In this post I’ll dive into the amazing advances made in expanding programmer productivity beyond the basic 3-GL’s. What's most interesting about these advances is that they were huge, market-proven advances, and have subsequently been ignored and abandoned by academia and industry in favor of a productivity-killing combination of tools and technologies.

From the Beginning to 3-GL's

The evolution of programming languages has been different from most kinds of technical evolution. In most tech development, there’s an isolated advance. The advance is then copied by others, sometimes with variations and additions. There follows a growing number of efforts concentrating on some combination of commercialization, enhancement and variation. This resembles biological evolution in that once mammals were “invented” there followed a growing number of varied mammalian species with ever-growing variations and enhancements.

If you glance at the evolution of programming languages, it can easily seem as though the same kind of evolution has taken place. It makes sense: software languages are for computers, and don’t computers get faster, smaller and cheaper at an astounding rate?

Let’s start by reviewing the evolution of programming languages up to what are commonly called 3-GL’s. For details see this.

First generation languages are all the native language of a particular computer, expressed as the computer executes the language: in binary. A program in a 1-GL, normally called machine language, is a big block of 1’s and 0’s. If you understand it, you can break the numbers up into data and instructions, and the instructions into command codes and arguments. Necessary for a machine, but a nightmare for humans.

A program in a 2-GL, normally called assembler language, is a text version of a 1-GL program, with nice additions like labels for locations of instructions and data. 2-GL’s were a night-and-day advance over machine language.

A program in a 3-GL, for example COBOL, FORTRAN or C, is a text language that is independent of the computer that can be translated (compiled or interpreted) to run on any machine. There are statements for defining data and for defining the actions that will be taken on the data. The action statements normally constitute the vast majority of the lines. For many programs, 3-GL’s were 5 to 10 times more productive than assembler language, with the added advantage that they could run on any machine.

We’re done, right? In some sense, we are – there are still vast bodies of production code written in those languages. No later language can create a program that has greater execution speed. But maybe we’re not done. As I’ve described, the “high” of high-level isn’t about the efficiency of the computer; it’s about the efficiency of the human – the time and effort it takes a human program to write a given program using the language. There have been a host of languages invented since the early days of 3-GL’s that claim to do this.

Let’s look at a couple of languages that no one talks about and don’t have a category name, were wildly popular in their day and that live on today, unheralded and ignored. I’ll use two examples.

The Way-beyond-3-GL's: MUMPS

The first of these languages I’ll describe is MUMPS, developed at Mass General Hospital for medical processing. Have you ever heard of it? I didn’t think so.

In modern terms, MUMPS is definitely a programming language; it has all the capability and statement types that 3-GL’s have. But MUMPS goes way beyond the boundaries of all 3-GL’s to encompass the entire environment needed for building and running a program. Normally with a 3-GL someone needs to pay lots of attention to things “outside” the language to achieve an effective solution, particularly in the areas of data access, storage and manipulation, but also in the operating system. A MUMPS program is inherently multi-user and multi-tasking. It has the ability to reference data without the potential danger of pointers. It has the power and flexibility of modern DBMS technology built in – not just relational DBMS but also key-value stores and array manipulation features that are still missing from most subsequent languages. In other words, you can build a comprehensive software application in a single programming environment without external things like databases, etc.

The result of this wide variety of powerful features all available in one place implemented as an integral part of the language was definitely outside the mainstream of programming languages – but wildly productive. MUMPS had strong uptake in the medical community. For example, the hospital software with dominating market share today is Epic, which was originally written in MUMPS (now called Cache) and remains so today. An amazing number of other leading medical systems are written in the language, as well as in the financial sector.

Net-net: MUMPS is truly a beyond-3-GL high level language in that the total amount of human effort required to reach a given programming result is much less. Even better, all the skills are normally in a single person, while modern languages require outside skills to achieve a given result, for example a database expert.

The Way-beyond-3-GL's: PICK

PICK is another beyond-3-GL that delivered a huge up-tick in programmer productivity. PICK, like MUMPS, is largely forgotten today. It’s an afterthought in any discussion of programming language history, ignored by academics, and generally erased. The title of its entry in Wikipedia is even wrong – it’s called an operating system! Of course, it is an operating system – AND a database AND a dictionary AND a full-featured programming language AND a query system AND a way to manage users and peripherals — everything you need to build and deliver working software, all in one place. PICK was a key driving factor in fueling the minicomputer industry during its explosive growth in the 1970’s and 80’s, while also running on mainframes and PC’s.

Wikipedia says: "By the early 1980s observers saw the Pick operating system as a strong competitor to Unix.^[13] BYTE in 1984 stated that "Pick is simple and powerful, and it seems to be efficient and reliable, too … because it works well as a multiuser system, it's probably the most cost-effective way to use an XT."

PICK was the brainchild of a modest guy named Dick Pick. During the early 1980’s I worked at a small company in the Boston area that attempted to build a competitor to PICK, which seemed to be everywhere at the time. As you might imagine, programmer humor emerged on the subject, including such gems as

If Dick Pick picked a pickle, which pickle would Dick Pick pick?

PICK lived on in many guises and with multiple names. But it has zero mind-share in Computer Science and among most people building new applications today.

Conclusion

All-encompassing programming environments like MUMPS and PICK should have become the dominating successors to the 3-GL languages, particularly as the total effort to develop working systems based on 3-GL’s like Java exploded with the arrival of independent DBMS’s, multi-tier hardware environments and the orthodoxy of achieving scalability via distributed applications. Yet another step on the peculiar path of software evolution.

I remember the frenzy during the internet explosion of the late 1990’s and early 2000’s of money flowing in and the universal view among investors and entrepreneurs about how applications must be written in order to be successful. I encountered this personally when my VC partner introduced me to what appeared to be a promising young medical practice management system company that was having some trouble raising money because investors were concerned that the young programmer doing much of the work and leading the effort wasn’t using java. I interviewed the fellow, Ed Park, and quickly determined that he was guided in his technical decision-making by smart, independent thinking rather than the fashionable orthodoxy. I endorsed investing. The company was Athena Health, which grew to become a public company with a major market share in its field. And BTW achieved linear scalability while avoiding all the productivity-killing methods everyone at the time insisted were needed.

The history of amazing, beyond-3-GL's like MUMPS and PICK that deliver massive programmer productivity gains demonstrates beyond all doubt that software and all its experts are driven by fashion trends instead of objective results, and that Computer Science is a science in name only.

November 30, 2020
What is high about a high level language in software?
I’ve talked in detail about the supposed progress in computer software languages, and explained the two major advances that have been made. All modern software languages are called “high-level languages.” As is typical in the non-science of Computer Science, no one bothers to define exactly what is meant by “high.” This is hilarious, since a single character being missing or out of place can cause a giant piece of software to crash – if there’s anything in this world in which precision is important, it’s in programming computers. But somehow the academic field and the vast majority of the practitioners don’t bother with little details like defining precisely what is “high” about a high-level language.

There was no such lack of precision by the person who invented the first widely used high-level language. John Backus first proposed FORTRAN to his superiors at IBM in late 1953. A draft specification was completed in 1954 and the first working compiler was delivered in early 1957. Wikipedia has it clear, simple and correct:

While the community was skeptical that this new method could possibly outperform hand-coding, it reduced the number of programming statements necessary to operate a machine by a factor of 20, and quickly gained acceptance. John Backus said during a 1979 interview with Think, the IBM employee magazine, "Much of my work has come from being lazy. I didn't like writing programs, and so, when I was working on the IBM 701, writing programs for computing missile trajectories, I started work on a programming system to make it easier to write programs."

According to the creator of the first successful high level language (echoed by his buddies and collaborators), “high level” in the context of software languages means “takes fewer statements and less work” to write a given program. That’s it! My blog post on the giant advances in software programming languages explains and illustrates this in detail.

The whole point of FORTRAN was to make writing numeric calculation programs quicker and easier. It succeeded. But it didn’t enable ANY assembler language program to be written in it. As I explain here, the invention of COBOL filled the gaps in FORTRAN for business programming, and C filled the gaps for systems programming.

How did the HLL’s achieve their productivity?

The early HLL’s were centered on wonderful, time-saving statements like assignments and if-then-else that both saved time writing and increased readability. In addition, the early creators were thoroughly grounded in the fact that the whole point of software was … to read data, do some tests and calculations and produce more data, a.k.a. results. Each of the amazing new languages therefore included statements to define the data to be read and written, and other statements to perform the actions of reading and writing. COBOL, for example, had and has two major parts to each program:
- Data Division, in which all the data to be used by the program is defined. When data is used by multiple programs, the definitions are typically stored in one or more separate files and copied by reference into the Data Division of any program that uses them. These are usually called “copy books.”
- Procedure Division, in which all the action statements of the program are defined. These include Read and Write statements, each of which includes a reference to the data definitions to be read or written.
This way of thinking about things was obvious to the early programmers. They had data; they had to make some calculations based on it and make new data. Job one was define the data. Job two was, with reference to the data defined, perform tests and calculations. For example, reading in deposits and withdrawals, and updating current balances.

After the Great Start…

Of course, things were not sweetness and light from thence forth. Huge amounts of noise and all the attention of software language people were generated by minor variations on the basic theme of high level languages. No one EVER argued or measured how many fewer statements or reduction of work it took. I guess a little birdie deep inside most of the participants would chirp “don’t go there” whenever one of the language fashionistas was tempted to actually measure what difference their new language resulted in.

I’m not going into the claims of virtue made during the last 50 years of pointless language invention work in detail, any more than I would go into the contents of garbage trucks to count and measure the differences in what people throw out. In the end, it’s all garbage. I’ll just mention these:

There’s a large group of inventors who claim that use of their new wonder-language will prevent programmers from making as many errors. This is high on the list of people who make programs HARDER to write by eliminating pointers. Any studies or measurements? Anyone notice even anecdotally a big uplift in software quality? Anyone?

Advocates of the O-O cult like to talk about how objects keep things private and prevent bugs. They suppress all talk of the hoops that programmers have to jump through to make programs conform to O-O dogma, with the resulting increase of lines of code, effort and bugs.

Conclusion

The starting years of computer software language invention were incredibly productive. The original high level languages achieved the vast majority of the "height," i.e., reduction in lines of code and effort to write a given program, that could possibly be achieved. The subsequent history of language invention includes a couple minor improvements and filling a small but important gap in coverage (in systems software). But mostly it's a story of wandering around in the wilderness spiced up by things being made worse, as I'll show in subsequent posts.
October 22, 2020
The giant advances in software programming languages

There have been two gigantic advances in programming languages, each contributing huge advances in programmer productivity. Since then, thousands of times more attention has been given to essentially trivial changes than has been given to the two giant advances. I described these 50 years of non-advances here. In this post I’ll go back in history to describe the two giant advances that were made in the early days of computing, back in the 50’s, and I’ll tack on the one important corrective that’s been made since then.

First, the pre-history of computer languages

The famous early computer was the ENIAC.

https://en.wikipedia.org/wiki/ENIAC

It cost the equivalent about of $7 million to build and took about 3 years. It was big, occupying about 300 sq ft. It was roughly 1,000 times faster than electro-mechanical machines, so you can see why people were excited about it. Its purpose was to perform complex calculations on scientific/engineering data.

While the excitement was justified, the method of programming was gruesome.

“ENIAC was just a large collection of arithmetic machines, which originally had programs set up into the machine^[33] by a combination of plugboard wiring and three portable function tables (containing 1,200 ten-way switches each).^[34] The task of taking a problem and mapping it onto the machine was complex, and usually took weeks. Due to the complexity of mapping programs onto the machine, programs were only changed after huge numbers of tests of the current program.^[35] After the program was figured out on paper, the process of getting the program into ENIAC by manipulating its switches and cables could take days. This was followed by a period of verification and debugging, aided by the ability to execute the program step by step”

Engineers (all men) would figure out what they needed the machine to calculate. It then fell to the “programmers,” all women, to perform the excruciatingly detailed work with the knobs and switches.

From this it is extremely clear that while the machine was an astounding achievement in speed, the speed of programming the calculation steps was the huge bottleneck. How can we make this faster?

Answer: instead of physical plugs and switches, let's invent a language for the machine — a "machine language!" That's what happened with the invention of the stored program computer, which took place in 1948. Instead of moving plugs and flipping switches, programming was done by writing it out in machine language on paper, loading it into the computer via punched paper cards, and then having the computer execute it. Figuring out the detailed steps was still hugely difficult, but at least getting it into the computer was faster.

From machine language to assembler

Every machine has one and only one “machine language.” This is a stream of binary data that the machine’s processor “understands” as instructions. Each instruction causes the machine to perform a single action on a single piece of data.

I learned to read and write machine language for a couple different machines. It’s not easy. From early on, machine language was usually shown as hexadecimal instead of raw binary, so that instead of looking at “0100101000101111” you see the hexadecimal equivalent: 4C2F. A big plus but you’re looking at a hexadecimal pile of data. A nice editor will at least break it up into lines of code, but you still have to read the hex for the instruction, registers and storage locations

Moving from machine language to assembler language was HUGE. Night and day in terms of readability and programmer productivity. Assembler language is a body of text that closely corresponds to the machine language. Each line of text typically corresponds to exactly one machine language instruction. Assembler language transformed programming. If you understand any machine language, you can easily write and read the corresponding assembly language. For example, here's the binary of a simple instruction:

Here's the equivalent in hexadecimal:

And here it is in assembler language, including a comment:

Just as important as making instructions readable was using text labels for locations of data and addresses of other instructions. In machine language, an address is expressed as a “real” address, for example as a hexadecimal number. If you inserted a command after a jump command and before its destination, you would have to change the address in the jump. You can see that with lots of jumps this would quickly become a nightmare. By labeling program lines and letting the assembler generate the “real” addresses, the problem disappears.

From assembler to high-level languages

Assembler made writing code in the language of the machine incredibly easier. It was a huge advance. But people quickly noticed two big problems. The solution to both problems was the second giant advance in software programming, high-level languages.

The first problem was that writing in assembler language, while worlds easier than machine language, still required lots of busy-work. For example, adding B and C together, dividing by 2 and storing the result in A should be pretty easy, right? In nearly any high-level language it looks something like this:

                A = (B+C)/2

Putting aside messy details like data type, this would be the following in pseudo-assembler language:

                Load B into Register 1

                Load C into Register 2

                Add Register 2 to Register 1

                Load 2 into Register 2

                Divide Register 1 by Register 2

                Store Register 1 into A

Yes, you can quickly become proficient in reading and writing assembler, but wouldn’t the 1 line version be easier to write and understand than the 6 line version?

Expressions were the core advance of high level languages. Expressions turned multi-line statement blocks that were tedious and error-prone into simple, common-sense things. Even better, the use of expressions didn’t just help calculations – it helped many things. While assembler language has the equivalent of conditional branches, many such conditionals also involve expressions, for example, the following easy-to-read IF statement with an expression

                If ((B+C)/2 > D) THEN … ELSE …

Would turn into many lines of assembler – tedious to write, taking effort to read.

The second problem emerged as more kinds of computers were produced, each with its own machine and assembler language. A guy might spend months writing something in assembler and then want to share with a friend at another place using a different type of computer. But the friend’s computer was different – it had a different assembler/machine language! The whole program would have to be re-written. Total bummer!

Wouldn’t it be nice to write a program in some language that was like the 1 line version above, and that could be translated to run on any machine??

Easier to write by a good 5X, easy to read and can run on any machine, present or future. Hmmmm… There must be something wrong, this sounds too good to be true.

The concept of high level languages was good and wasn’t too good to be true. Everyone agreed, and it wasn’t long before FORTRAN and similar compute-oriented languages arose. Even performance-obsessed data nerds wrote programs in FORTRAN rather than assembler because FORTRAN compiled into machine language was just as fast, and sometimes even faster because of optimizations those clever compiler guys started inventing! And the programs could run anywhere that had a FORTRAN compiler!

The accounting and business people looked at what was happening over in heavy-compute land and grew more and more jealous. They tried FORTRAN but it just wasn’t suitable for handling records and transactions, not to mention having nice support for financial data. So business processing languages got invented, and COBOL emerged as the best of the bunch.

If you look at a compute statement in FORTRAN and COBOL, they’re nearly identical. But in the early days, FORTRAN had no support for the money data type and no good way to representing blocks of financial data that are core to anything involving accounting.

Once they got going, people just kept on inventing new languages for all sorts of reasons. The obvious deficiencies of FORTRAN and COBOL were fixed. Languages could handle intense calculations and business data processing about as well. But back around 1970, 50 years ago, there remained a gap. That gap was that an important category of programs could not be written in any of the high-level languages. Essential programs like operating systems, device drivers, compilers and other “systems” software continued to be written in the assembler language of the machine for which it was intended. There were attempts to fill this gap, but they failed. Then a couple super-nerds at Bell Labs, building on an earlier good shot at solving the problem, invented the language C.

They rewrote the UNIX kernel in C, and the rest is history – C became the high-level language of choice for writing systems software and still holds that place.

Conclusion

Two giant advances were made in programming languages. Each of these happened early in computing history, in the 1950’s. The first giant advance was the biggest: with assembler language, machine language was reasonable to write, read and change. The second giant advance was the category of high level languages. Two minor variants on the concept, COBOL and FORTRAN, established the value of having a language that enabled expressions and could be compiled to run efficiently on any machine. In spite of the on-going tumult of language invention that has taken place since, the fact that huge bodies of working code written in those two languages continue to perform important jobs makes it clear that they embodied the giant advance that was high level languages. The only substantial omission in those languages – the ability to directly reference computer memory – was filled by the invention of the language C.

Most of what’s happened since in programming languages is little but sound and fury. For a review of those non-advances see this.

I do appreciate some of the subtle changes that have been made in language design. Some modern languages really are better – but mostly not because of the details of the language! I’ll treat this issue in a future post. Meanwhile, it’s important to understand history and appreciate the giant steps forward that high level languages as a category represent.

September 23, 2020
Software Programming Languages: 50 years of Progress

I was inspired to examine 50 years of progress in computer languages by an article in the Harvard alumni magazine written by one of the leading figures in Harvard computer activity, Harry Lewis. Lewis graduated from Harvard College the year I entered, and he stayed on for a graduate degree while I was a student there. He says:

Thirty veterans of Harvard’s Aiken Computation Lab reunited on January 19, 2020, some 50 years after each of us had a role in creating today’s networked, information-rich, artificially intelligent world. Rip van Winkles who had never fallen asleep, we gathered to make sense of what had evolved from our experience as undergraduates and Ph.D. candidates during the decade 1966-1975. One thing was clear: we hadn’t understood how the work we were doing would change the world.

I interacted with many of the people and projects he describes in the period he focuses on. For this post I decided to focus just on one of his major topics, programming languages.

50 years of progress

You might think a book would be required to adequately describe the advances that have been made in programming languages in the last 50 years. In fact, many experts seem to think so. Programming languages continue to advance and improve, even with everything that’s been achieved in the last 50 years – and more!

There is no doubt that many programming languages have been introduced in decades past. The pace of new language introduction does not seem to have slowed. But has progress been made? Yes – if you count dead-ends, bad ideas, going in circles and trivial variations as progress.

While describing the true progress in programming languages is easy, explaining why all the widely-touted “advances” aren’t in fact progress would indeed take a book. Which someone other than me is welcome to write. It would be boring! What could you expect from a gruesomely detailed exposition that amounts to Idea 1 is stupid because …; Idea 2 isn’t any better because …; idea 3 just another attempt with slightly varied syntax to go after the same fruitless goal as Idea 4, decades before …

I gave a talk that included this subject to a large group of programmers in India 25 years ago. Here is the slide illustrating the mighty progress of languages as illustrated by what COBOL calls a Compute statement:

I hope that gives you a basic idea of the massive advances that had been achieved by the mid 1990’s.

The same progress has continued unabated until the present. Google is promoting the language Go, often called Golang. Go must be one terrific language if the geniuses at Google are behind it, right? Here’s that same simple assignment statement I illustrated decades ago, updated to show how Google has made it even better:

A:=B*C+D

This illustrates the exquisite judgment expressed in new language design: The assignment statement has the colon before the equals sign, like PL/1, but leaves off the ending semicolon. Genius!

Uncommon common sense about programming languages

The inventors and promoters of new programming languages nearly always talk about how the wonderful new language is "better" than all other languages. Frequently mentioned are that the new language helps programmers be more productive and commit fewer errors.

In my experience and subjective judgment, all such claims are false and simply not proven out in practice. More important is the fact that the inventors/promoters of new language have no real arguments beyond their own enthusiasm to back their claims: there is NO evidence-backed science to support ANY of the claims that a given language is superior on some measure than others. There isn't even any substantial agreement on the appropriate measure of goodness, much less an evidence-based hierarchy of languages against the measure of goodness!

That should be the end of it. Once you realize how devoid of evidence, experimentation and science the whole endeavor is, it becomes clear that anyone promoting the virtues of a new language is either an ignorant buffoon or self-promoting cult leader. Or some of each.

Following are some observations to help put this into context.

I do understand how the enthusiasm for a new language can catch hold. Being a programmer requires a powerful set of blinders and the ability to focus on an incredibly narrow field for long periods of time. It's essential to being a good programmer! But you have to step back and look at your enthusiasm objectively. Above all you have to have objective measures to compare your wonderful new language to the bad old ones. This is almost never done in software, though it's commonplace in other fields. Would Marie Curie have been a better scientist had she used German instead of French as her primary language at work? See this for more.

There is a hierarchy of skills in software. Knowing a programming language is an important part of the hierarchy but it's near the bottom of the stack! Think about learning a new human language as an adult, ignoring for the moment the huge advantage you have being proficient in a language. You start with basic vocabulary and grammar. How long and hard is it to get the accent a native speaker has? Now think about vocabulary, the basic subset you've learned vs. the huge set of the native speaker. Think about using the basics of the language to get things done compared to using the language as an excellent writer or speaker. Now think about idiomatic expressions, of which any native speaker has a huge set — without them, you're not fluent. Finally, think about a profession like law or medicine. Those proficient are using the base language, but with a vocabulary,, knowledge and sophistication that take years to learn. These things are like the subroutine library that is a key part of any language though not strictly speaking part of the language itself, and the professional specialty is like networking or database. Yes, the core language is a necessary foundation, but most of the real skill is in the rest. See this for more depth on this important subject.

It turns out that the "goodness" of a software effort isn't really dependent on the language used! When you dig into what really matters, it turns out that factors outside the strict confines of language design are overwhelming important, though details that new-language enthusiasts tend to ignore can play a role. See this for details.

One more key factor in the explosion of languages that are touted as advances but are really just a variation on the same old thing is the core human behavior of creating a language among people who work together on the same things for long periods of time. But no one has found that any of these languages are superior in some real way than others — they're just the languages that happen to have evolved, each with some unique vocabulary reflecting unique concerns, like the words that people who live in the arctic have for different kinds of snow and ice. This is a well-known behavior, and why shouldn't it also be seen when people "speak" computer languages together? See this for examples and details.

When you lay out the spectrum of languages, you can see some patterns. The most obvious of the patterns I've observed is the extent to which data is defined as part of the language itself or outside the language proper. Data can be fully accessed and manipulated in all cases — the difference is the extent to which the data definitions are made part of the language itself, or as part of a core library attached to the language. Language mavens are passionate about each of the variations, and convinced that theirs is best! See this for details.

What among other things enables large groups of people to hail the latest "advance" in computer languages without being laughed out of town is the shocking ignorance and lack of interest in computer and software history everyone involves exhibits. With the least bit of software history knowledge, no one would have the nerve to invent a new language, much less brag about it. See this for more.

What does this mean?

If you're a techie, you should get a grip, some perspective and knowledge of history and science. It will help you see, for example, that the fact that Marie Curie's native language was Polish and that she worked largely among French speakers was incidental to her great mind and achievements. It will help you see that Shakespeare wouldn't have been a much better playwright if only he had written in Latin (or whatever), or would have been much worse had he written in Norse (or whatever).

If you're a manager, you should respect the enthusiasm of your employees but understand the fact that their enthusiasm is baseless. I saw a software re-write into a new language using the Neo4j database. Everyone was pumped. Disaster. I have seen a couple cases in which huge enthusiasm was behind a less radical choice. The results were no better in any way than using a more standard language — the engineers worked harder than they would have because of their enthusiasm, but were less productive because they were less skilled in the new environment. And of course, new hires were hampered with the same problem.

Conclusion

Computer Science is a profoundly unscientific field. The fashion-driven nature of it is exhibited starkly in the decades-long march of revolutionary new languages that keep getting invented — and having no impact on anything that matters.

If you want to be the best possible programmer, it's incumbent on you to use a sensible, mainstream language and spend your effort producing excellent results — including bringing the whole notion of excellence to new levels. The language you use will not be the limiting factor — what's important is the way you use the language, just as it was for Marie Curie.

September 17, 2020
Innovations that aren’t: data definitions inside or outside the program

There are decades-long trends and cycles in software that explain a great deal of what goes on in software innovation – including repeated “innovations” that are actually more like “in-old-vations,” and repeated eruptions of re-cycled “innovations” that are actually retreating to bad old ways of doing things. Of the many examples illustrating these trends and cycles, I’ll focus on one of the most persistent and amusing: the cycle of the relationship between data storage definitions and program definitions. This particular cycle started in the 1950’s, and is still going strong in 2015!

The relationship between data in programs and data in storage

Data is found in two basic places in the world of computing: in “persistent storage” (in a database and/or on disk), where it’s mostly “at rest;” and in “memory” (generally in RAM, the same place where the program is, in labelled variables for use in program statements), where it’s “in use.” The world of programming has gone around and around about what is the best relationship between these two things.

The data-program relationship spectrum

At one end of the spectrum, there is a single definition that serves both purposes. The programming language defines the data generically, and has commands to manipulate the data, and other commands to get it from storage and put it back.
At the other end of the spectrum, there is a set of software used to define and manipulate persistent storage (for example a DBMS with its DDL and DML), and a portion of the programming language used for defining “local” variables and collections of variables (called various things, including “objects” and “structures.” At this far end of the spectrum, there are a variety of means used to define the relationship between the two types of data (in-storage and in-program) and how data is moved from one place to the other.

For convenience, I’m going to call the end of the spectrum in which persistent data and data-in-program are as far as possible from each other the “left” end of the spectrum. The “right” end of the spectrum is the one in which data for both purposes is defined in a unified way.

People at the left end of the spectrum tend to be passionate about their choice and driven by a sense of ideological purity. People at the right end of the spectrum tend to be practical and results-oriented, focused on delivering end-user results.

Recent example: Ruby and RAILS

The object-oriented movement tends, for the most part, to be at the left end of the spectrum. Object-oriented people focus on classes and objects, which are entirely in-program representations of data. The problem of “persisting” those objects is an annoying detail that the imperfections of current computing environments make necessary. They solve this problem by various means; one of the most popular solutions is using an ORM (object-relational mapper), which does the work of moving objects between a DBMS and where they “should” be nearly automatic. It also can automate the process of creating effective but really ugly schemas for the data in the DBMS.

A nice new object-oriented language, Ruby, appeared in 1995. It was invented to bring a level of O-O purity to shell scripting that alternatives like PERL lacked. About 10 years later, a guy was working in Ruby for Basecamp and realized that the framework he'd created for it was really valuable: it enabled him to build and modify things his company needed really quickly. He released it to open source and became known as the RAILS framework for Ruby, the result being Ruby on RAILS. Ruby on RAILS was quickly adopted by results-oriented developers, and is the tool used by more than half a million web sites. While "pure" Ruby resides firmly at the left end of the spectrum, the main characteristic of the RAILS framework is that you define variables that serve for both program access and storage, putting it near the right end of the spectrum.

Rhetoric of the ends of the spectrum

Left-end believers tend to think of people at the other end as rank amateurs. Left-enders tend to claim the cloak of computer science for their style of work, and claim that they're serious professionals. They stress the importance of having separation of concerns and layers in software.

Right-end practitioners tend to think of people at the other end as purists, ivory-tower idealists who put theory ahead of practice, and who put abstract concerns ahead of practical results. They stress the importance of achieving business results with high productivity and quick turn-around.

It goes in cycles!

There have always been examples of programming at both ends of the spectrum; neither end disappears. But what happens is that the pendulum tends to swing from one end to the other, without end up to now.

In 1958, Algol was invented by a committee of computer scientists. Algol 60 was purely algorithmic — it only dealt with data in memory, and didn't concern itself with getting data or putting it back. Its backers liked its "syntactic and semantic purity." It was clearly left-end oriented. But then practical business people, in part building on earlier efforts and in part inspired by the need to deliver results, agreed on COBOL 60, which fully incorporated data definitions that were used both for interacting with storage and for performing operations. It was clearly right-end oriented. All the big computer manufacturers, anxious to sell machines to business users, built COBOL compilers. Meanwhile, Algol became the choice for expressing algorithms in academic journals and text books.

There was a explosion of interest in right-end languages during the heyday of minicomputers, with languages like PICK growing in use, and niche languages like MUMPS having their day. The rise of the DBMS presented new problems and opportunities. After a period of purism, in which programmers struggled with using left-end languages that were supposed to be "easy" like BASIC to get jobs done, along came Powersoft's Powerbuilder, which dominated client-server computing because of the productivity win of integrating the DBMS schema with the language. The pendulum swung back with the rise of the internet and the emergence of "pure" java, with its arms-length relationship to persistence, putting it firmly in the left-end camp. Since Java wasn't pure enough for some people, "pure" Ruby got out there and grows — and some Ruby user is under pressure to deliver results and invents RAILS, which then spreads to similarly-minded programmers. But "amateurs" who like Java but also like productivity invented and are spreading Grails, a derivative of Java and RAILS that combines advantages of each.

So is RAILS an innovation? Java? I don't think so. They're just minor variants of a recurring pattern in programming language design, fueled by the intellectual and emotional habits of groups of programmers, some of whom push towards the left end of the spectrum (with disdain for the amateurs) and others of whom push towards the right end of the spectrum (with disdain for the purists).

Conclusion

This spectrum of the relation between data and program has persisted for more than 50 years, and shows no signs of weakening. Every person or group who, finding themselves in an environment which emphasizes the end of the spectrum they find distasteful, tries to get to the end they like. Sometimes they write a bunch of code, and everyone thinks they're innovative. Except it would be more accurate to say they're "inno-old-ive." This is one of the many patterns to be found in software history, patterns that account for so much of the detail that is otherwise hard to explain.

June 30, 2015
Continents and Islands in the World of Computers

The vast majority of people appear to think that the world of computers and software is pretty uniform. While everyone recognizes that there are differences between the systems used by consumers and ones in business, it makes sense that pretty much the same thing is going on inside.

The reality is that vast cultural and practical differences separate the various clusters of computer and software applications. I look forward to the first anthropological studies that are devoted to this subject, illustrating and spelling out the untravelled oceans that separate the diverse lands of computer and software practice. Meanwhile, there are both obstacles and opportunities that arise from these facts.

A Diversity of Tongues

The Bible gives a vivid explanation of how the various languages arose. In the post-flood world, there was said to be a single people with a single tongue. They built a city with a tower that reached to the sky, to make a name for themselves.
As a single people with a single tongue, "the sky was the limit" for how far they could go. God didn't like this. He reached down and confounded their speech, and scattered them over the face of the earth.

Whether it's the fault of God or humans, it's well understood that groups of people develop their own languages, which then evolve and splinter. In fact, by studying the relationship of various languages, you gain insight into how humans migrated over the earth. The Indo-European family of languages is an excellent example of this.

A Diversity of Software Languages

Early computer programmers in the 1950's clearly saw the advantage of having a single language for software. They tried hard to create a universal software language, FORTRAN being the most well-known and successful such early language. While FORTRAN (short for "formula translator") was great for math people, people working with business records weren't impressed. Thus COBOL ("COmmon Business-Oriented Language") was invented. Things were still pretty simple in the mid-1950's.

But of course it didn't stop there. People migrated to different "lands," confronted new issues, and created new languages suitable for the new problems. Here's a snapshot of some of the major developments in the mid-1970's.

By now, literally thousands of general-purpose languages have been created. Not including "esoteric" languages and thousands more languages for narrow problem domains.

Idioms and Layers

As anyone who has learned a new human language as an adult is well aware, learning a language is one thing — but learning all the incidental aspects of the language, particularly the idioms, is a task that never seems to end. This is because there are lots of them — an estimated 25,000 of them in English, for example.

There are equivalents of idioms in software languages, in multiple categories. I'll just give a basic example: the run-time library, whose documentation frequently exceeds that of the base language, and without which you can't write practical programs. A more complex example: a "framework," for example the RAILS and Sinatra frameworks for the Ruby language.

Let's say you know English pretty well. Are you qualified to write or even to read and understand a legal brief? Unless you're a lawyer, probably not. Yet there's no denying that the brief is written in English, with a whole pile of idioms and other special things that are not in common use.

It's the same with computer languages. You may know the language C pretty well, but when you first look at the C code in a driver or an operating system kernel, it's got a strange idiom — "straight" C would stand out as obviously as BBC English would in Brooklyn. The C that's in a compiler is even more rife with idioms and unfamiliar constructs, so much so that a person who is otherwise fluent in C would have trouble figuring out what was going on, just as a normal fluent English-speaker would have trouble following a discussion by two doctors of a difficult medical case.

Different Continents, Different Cultures

Everyone knows that human languages are part of overall human culture. Cultures differ at least as much as languages. It's not widely appreciated that this is also true in the world of computers. While of course there are commonalities, you actually think differently in different languages, whether human or software. Lots of incidental things tend to get wrapped up in the cultures as well. To take a simple example: in the US, when you walk into a store, prices tend to be marked on the goods. If you like that thing at that price, you buy it; otherwise you don't. It doesn't work that way in much of South Korea. Prices may not be marked at all; and negotiation is assumed and expected. There are a whole set of cultural norms that have to be understood in order to thrive.

Differences in software cultures are just as strong as differences in human cultures. For example, one of the major forces in the world of hospital automation is Epic. Epic is written in a language originally called MUMPS. While few write programs from scratch in MUMPS anymore, you have to use the language to customize the Epic system, just as you have to use ABAP to customize the SAP manufacturing system. In either case, learning the peculiar language is the tip of the iceberg — your programs "live in" the Epic system, and so knowing that system inside out is the key to success, far more important than utilization of the language itself. This is perhaps comparable to the importance of knowing all the relevant prior case law in writing a legal brief in support of your position, vs. simply knowing the vocabulary and syntax of English.

Contact Between Distant Cultures

We know that in human life, cultures developed in isolation from each other for many thousands of years. Migrating peoples would clash, and the cultures with superior instruments and warrior culture tended to decimate the others. A culture that develops superior methods of war, usually with distinct technology, tends to expand. This was true, for example of the Commanche in North America and the Mongols, each of which developed unique competence and technology with horses, and used that to rapidly expand their sphere of influence.

This is an area of both similarity and difference with software cultures. Within a company, there are frequently members of different software "tribes," who usually neither understand nor like each other. They are constantly at war to establish primacy, the winning culture grudgingly conceding resource-poor reservations to which the losers are confined.

It's different with software cultures that are separated by "oceans," usually different problem domains or industries. You might like to think that everyone who does software has all the information available, and therefore is at a similar cultural level, the equivalent of, say, the different countries in Europe. They may speak different languages and like different foods, but they all have cars, telephones, and electrical appliances. In software, this is not the case! In software, there are cultural differences that are the equivalent of cars being widely available in one place, while ox carts are the standard mode of transport in another. It's that extreme.

What's even more shocking is that the denizens of these culturally isolated software continents are comfortable and secure in what outsiders see as their "backwardness" or "ignorance," and find ways to denigrate and disparage outsiders who dare to suggest there might be a better way of doing things.

How big are these culturally retarded continents? Just a few distant places, the equivalent of Australia? If you have the opportunity to see a wide swath of software culture, what you find is that the vast majority of software groups have cultures that are dramatically inferior to the best places. Moreover, the variance in just how primitive things are is huge. In any given place, it is likely that practices that are unknown but would dramatically enhance results are already standard practice in other places. Judging software isn't like auditing the financial books of a company, in which you either pass or fail an audit. It's more like figuring out which part of which software continent the place is, and how many years or decades behind the best known methods the place is, and to what extent their near neighbors are slightly ahead or behind.

Conclusion

It's useful to compare human language and culture to software language and culture. Just as with humans, language is an important part of culture, but thriving involves a whole lot more than just grasping the basics of a language. Just like with humans, there are varying levels and there are conflicts. But what's most interesting are the differences, which are the equivalent of humans living in environments that are physically next to each other, but using tools and methods that are hundreds of years different in terms of evolution. This fact has huge implications on many levels.

April 6, 2014
How to Evaluate Programming Languages
Programmers say language A is "better" than language B. Or, to avoid giving offense, they'll say they "like" language A. Sometimes they get passionate, and get their colleagues to program in their new favorite language.

This doesn't happen often; inertia usually wins. When a change is made, it's usually passion and energy that win the day. If one person cares a WHOLE LOT, and everyone else is, "whatever," then the new language happens.

Sometimes a know-nothing manager (sorry, I'm repeating myself) comes along and asks the justification for the change. The leading argument is normally "the new language is better." In response to the obvious "how is it better?" people try to get away with "It's just better!" If the manager hangs tough and demands rationality, the passionate programmer may lose his cool and insist that the new language is "more productive." This of course is dangerous, because the rational manager (or have I just defined the empty set?) should reply, "OK, I'll measure the results and we'll see." Mr. Passion has now gotten his way, but has screwed over everyone. But it usually doesn't matter — who measures "programmer productivity" anyway?

But seriously, how should we measure degrees of goodness in programming languages? If there's a common set of yardsticks, I haven't encountered them yet.

The Ability of the Programmer

First, let's handle the most obvious issue: the skill of the programmer. Edgar Allan Poe had a really primitive writing tool: pen (not even ball-point!) and paper. But he still managed to write circles around millions of would-be writers equipped with the best word processing programs that technology has to offer. I've dealt with this issue in detail before, so let's accept the qualification "assuming the programmer is equally skilled and experienced in all cases."

Dimensions of Goodness

Programmers confined to one region of programming (i.e., most programmers) don't often encounter this, but there are multiple dimensions of goodness, and they apply quite differently to different programming demands.

Suppose you're a big fan of object-orientation. Now it's time to write a device driver. Will you judge the goodness of the driver based on the extent to which it's object-oriented? Only if you're totally blindered and stupid. In a driver you want high performance and effective handling of exception conditions. Period. That is the most important dimension of goodness. Of all the programs that meet that condition, you could then rank them on other dimensions, for example readability of the code.

Given that, what's the best language for writing the driver? Our fan of object orientation may love the fact that his favorite language can't do pointer arithmetic and is grudgingly willing to admit that, yes, garbage collection does happen, but with today's fast processors, who cares?

Sorry, dude, you're importing application-centric thinking into the world of systems software. Doesn't work, terrible idea.

It isn't just drivers. There is a universe of programs for which the main dimensions of goodness are the efficiency of resource utilization. For example, sort algorithms are valued on both performance and space utilization (less is better). There is a whole wonderful universe of work devoted to this subject, and Knuth is near the center of that universe.

Consistent with that thinking, Knuth made up his own assembler language, and wrote programs in it to illustrate his algorithms. Knuth clearly felt that minimum space utilization and maximum performance were the primary dimensions of goodness.

The largest Dimension of Goodness

While there are other important special cases, the dimension of goodness most frequently relevant is hard to state simply, but is simple common sense:

The easier it is to create and modify programs, quickly and accurately, the more goodness there is.
- Create. Doesn't happen much, but still important.
- Modify. The most frequent and important act by far.
- Quickly. All other things being equal, less effort and fewer steps is better.
- Accurately. Anything that tempts us to error is bad, anything that helps us to accuracy is good.
That, I propose, is (in most cases) the most important measure of goodness we should use.

Theoretical Answer

Someday I'll get around to publishing my book on Occamality, which asks and answers the question "how good is a computer program?" Until then, here is a super-short summary: among all equally accurate expressions of a given operational requirement, the best one has exactly one place where any given semanic entity is expressed, so that for any given thing you want to change, you need only go to one place to accomplish the change. In other words, the least redundancy. What Shannon's Law is for communications channels, Occamality is for programs. Given the same logic expressed in different languages, the language that enables the least redundancy is the best, the most Occamal.

Historical Answer

By studying a wide variety of programs and programming languages in many fields over many years, it's possible to discern a trend. There is a slow but clear trend towards Occamality, which demonstrates what it is and how it's best expressed. The trend is the result of simple pressures of time and money.

You write a program. Sombody comes along and wants something different, so you change the program. Other people come along and want changes. You get tired of modifying the program, you see that the changes they want aren't to different, so you create parameters that people can change to their heart's content. The parameters grow and multiply, until you've got loads of them, but at least people feel like they're in control and aren't bugging you for changes. Parameters rule.

Then someone wants some real action-like thing just their way. You throw up your hands, give them the source code, and tell them to have fun. Maybe they succeed, maybe not. But eventually, they want your enhancements in their special crappy version of your nice program, and it's a pain. It happens again. You get sick of it, analyze the places they wanted the changes, and make official "user exits." Now they can kill themselves customizing, and it all happens outside your code. Phew.

Things keep evolving, and the use of user exits explode. Idiots keep writing them so that the whole darn system crashes, or screws up the data. At least with parameters, nothing really awful can happen. The light bulb comes on. What if I could have something like parameters (it's data, it can't crash) that could do anything anyone's wanted to do with a user exit? In other words, what if everything my application could do was expressed in really powerful, declarative parameters? Hmmm. The users would be out of my hair for-like-ever.

What I've just described is how a new programming "level" emerges historically. This is what led to operating systems, except that applications are one giant user exit. UNIX is chock full of things like this. This history is the history of SQL in a nutshell — a powerful, does-everything system at the level of declarative, user-exit-like parameters!

The Answer

In general, the more Occamal the language (and its use), the better it is. More specifically, given a set of languages, the best one has
- semantics that are close to the problem domain
- features that let you eliminate redundancy
- a declarative approach (rather than an imperative one)
Let's go through each of these.

Problem Domain Semantics

A great example of a such a language is the unix utility AWK. It's a language whose purpose is to parse and process strings. Period. You want an accounting system, don't use AWK. You want to generate cute-looking web pages, don't use AWK. But if you've got a stream of text that needs processing, AWK is your friend.

From the enterprise space, ABAP is an interesting example. While it's now the prime language for writing SAP applications, ABAP was originally Allgemeiner Berichts-Aufbereitungs-Prozessor, German for "general report creation processor." In other words, they saw the endless varieties of reporting and created a language for it; then having seen the vast utility of putting customization power in the hands of users, generalized it.

Features that let you eliminate redundancy

This is what subroutines, classes and inheritance are supposed to be all about. And they can help. But more often, creating another program "level" is the most compelling solution, i.e., writing everything that the program might want to do in common, efficient ways, and having the application-specific "language" just select and arrange the base capabilities. This is old news. Most primitively, it's a subroutine library, something that's been around since FORTRAN days. But there's an important, incredibly powerful trick here. In FORTRAN (and in pretty much all classically-organized subroutine libraries), the library sits around passively waiting to be called by the statements in the language. In a domain-specific language, it's the other way round!

A related approach, which has been implemented over and over, is based on noticing that languages are usually designed independent of databases, but frequently used together. The result is monster redundancy! Wouldn't it be a hoot if database access were somehow an integral part of the language! Well, that explains how and why ABAP evolved from a reporting lanaguage (necessarily intimate with the database) into "Advanced Business Application Programming." And it explains why Ruby, when combined with the database-leveraging RAILS framework, is so popular and highly productive.

A declarative approach

In the world of programming, as in life, the world divides between imperative (commands you, tells you how to do something, gives you directions for getting to point B) and declarative (tells you what should be done, identifies point B as the goal). In short, "what" is declarative and "how" is imperative. A core reason for the incredible success of SQL is that it is declarative, as Chris Date has described in minute detail. Declarative also tends to be less redundant and more concise. As well as doesn't crash.

Conclusion

I'm sorry if you're disappointed I didn't name Erlang or whatever your favorite is as the "best programming language," but I insist that it's far more novel and useful to decide on what basis we are to judge "goodness" in programming languages, and in programs. In general and in most domains (with important exceptions, like inside operating systems), non-redundancy is the reigning virtue, and languages that enable it are superior to ones that are not. Non-redundancy is nearly always best achieved with a declarative, problem-domain-centric approach. And it further achieves the common-sense goal of fast, accurate creation and modification of programs.

Occamality is rarely explicitly valued by programmers, but the trend to it is easy to see. There are widespread examples of building domain-specific languages, meta-data and other aspects of Occamlity. Many programmers already act as though Occamality were the primary dimension of goodness — may they increase in numbers and influence!
March 5, 2014
HTML, Assembler Language, and Easy-To-Use Tools

HTML is the assembler language of the web, and that’s a good thing. All sorts of tools claim they will make your life easier and save you from the awful, nerdy depths of HTML. My conclusion: the tools are trouble; HTML is simple and powerful; just get over it, learn and use HTML!

There’s a pattern here. I have suffered through decades of tools that are supposed to “insulate” you from the grungy details of actual programming, while in the end, limiting what you can do, and not being so very easy to use after all.

Here’s the background: My wife has a personal web site. I am her webmaster. A little while ago, she decided she wanted to change the site in a major way. The tool I’ve been using for site creation and editing is way past its expiration date. So I decided to start over and dive into the wonderful world of site editing tools. This has got to be easy, I thought. Even Microsoft Word says that it can create and edit HTML.

“Not so Fast,” cries out the world of practical reality to me.

I tried a few “easy to use” site creation and editing tools. I didn’t want to get fancy. I just went to my hosting provider’s website, and did simple internet searches. I tried a few tools. I wasn’t happy with the results for various reasons (it couldn’t import my current site, it had templates but they were all ugly and you couldn’t circumvent them, etc.).

Then I tried importing the home page of the site into Microsoft Word. It appeared to work. I edited the page, eliminating most of the content, and displayed the results. Wrong! The original had a big-font title that was colored and underlined. Nothing I edited should have changed the color or made the text and the underline different. What’s going on here!

I did not want to dive into details. I really just wanted to get this done so I could move on! But to make a long story short, I ended up getting the HTML generated by the old site builder I used and the HTML that resulted from Word and studying them. Part of why I didn’t want to do this is that I’m not an expert HTML programmer. But I sucked it up and dove in. Among other things, I learned:

HTML isn’t so hard

I already knew the basics (it's just programming, like anything else, and diving in confirmed that view). Even without a deep background, you can learn it quickly. Here’s an example from a nice site (http://www.arachnoid.com/lutusp/html_tutor.html)

HTML Code

Browser Display

I want to <B>emphasize</B> this!

I want to emphasize this!

Word was astoundingly bad

There was more crap in the file inserted by Word for its own nefarious purposes than the original text. And it screwed up the original appearance!

HTML is like assembler language, and that’s good

I’ve always been a back-to-basics kind of guy, and never felt particularly limited during my many years of programming in assembler language. In fact, because of macro pre-processors, assemblers can be amazingly productive because you can customize the environment to suit the application, something which the “easy to use,” “higher level” tools make difficult.

Line Up, all you so-called easy-to-use tools, and Die

Or die a natural death. I don’t care how you get there. Just get it done!

Conclusion

I’ve been writing software for more than 40 years, and the more things change, the more they stay the same. Computers, I willingly grant you, are smaller, faster and cheaper; they hold more and communicate faster. But software just gets more bloated and heavy-weight, all the while promising greater ease of use and functional fitness. Going back to basics is refreshing and, sometimes, the only way to get the job done.

May 17, 2010