The Black Liszt

Category: Fashion

Summary: Software Fashions

Software is a precise discipline. A single bit out of place can crash a program. It’s like math. Fashion is the exact opposite – skirts are short or long, you wear a tie or you don’t, whatever. It’s all opinions and taste, nothing objective about it. How could there be fashions in bit-driven, nerdy software? This is a summary of my writing on software fashions.

https://blackliszt.com/2018/11/what-are-software-fashions.html

Here’s a survey of some current software fashions and a couple that have become standard practice.

https://blackliszt.com/2019/05/recurring-software-fashion-nightmares.html

Software people are, after all, people. People want to have status. One of the main ways to get status is to get identified with a fast-rising fashion trend and ride it.

https://blackliszt.com/2018/10/the-hierarchy-of-software-status.html

The status-driven attraction of fashion is so powerful that software people will often go with the fashionable approach and ignore software methods that would achieve dramatically better results.

https://blackliszt.com/2019/10/software-professionals-would-rather-be-fashionable-than-achieve-10x-productivity-gains.html

https://blackliszt.com/2019/09/laser-disks-and-workflow-illustrate-the-insane-fashion-driven-nature-of-software-evolution.html

Other things that become fashions are OK ideas taken way beyond where they make sense. These fashions often appear repeatedly, usually with new names.

https://blackliszt.com/2021/03/micro-services-the-forgotten-history-of-failures.html

Another pattern is wild swings from one end of a spectrum to the other — and back again. You'd better be on the right end of the spectrum or risk being "out of it!" In computer programs whether data is defined inside the program logic or outside of it is a decades-long example.

https://blackliszt.com/2015/06/innovations-that-arent-data-definitions-inside-or-outside-the-program.html

Computers and software are all about innovation — so how could innovation itself be a fad? Well, for a few years, innovation was a fashion-forward as you could get. Until it faded back to its usual place.

https://blackliszt.com/2016/04/the-innovation-bubble.html

Sometimes technical fashions are picked up by nontechnical people, who wildly promote the fashion without having clue about what the underlying technology is about, and therefore get it wrong. DLT (blockchain, the technology underlying cryptocurrencies) is an example of something that is promoted as a solution for problems for which it's not relevant.

https://blackliszt.com/2015/11/the-magic-of-block-chain.html

And for problems which have existing solutions that are often thousands of times better.

https://blackliszt.com/2017/11/blockchain-a-sailboat-without-sails.html

It wasn’t so long ago that everyone was talking about Big Data:

https://blackliszt.com/2015/03/big-data-the-driving-force-in-computing.html

https://blackliszt.com/2013/02/the-big-data-technology-fashion.html

https://blackliszt.com/2013/02/big-data-some-little-observations.html

https://blackliszt.com/2012/04/im-tired-of-hearing-about-big-data.html

Here’s an example of how the focus on fashion with Hadoop (a fashion-driven approach to Big Data) led to bad results.

https://blackliszt.com/2019/01/using-advanced-software-techniques-in-business.html

Large bunches of data, shared computers (the cloud) and virtualization are things that have been around since the early days of computing, are still around and always will be. But for a period of time they were what fashion-forward people talked about.

https://blackliszt.com/2012/05/the-cloud-and-virtualization.html

https://blackliszt.com/2011/12/the-name-game-of-moving-to-the-cloud.html

How is it possible that fashion can play such a large role in software decision-making? It’s because evidence is rarely part of software decision-making.

https://blackliszt.com/2017/02/evidence-based-software.html

https://blackliszt.com/2014/02/lessons-for-software-from-the-history-of-scurvy.html

In a broader sense, fashion rules in software because, in spite of the name, Computer Science is not a science.

https://blackliszt.com/2023/04/summary-computer-science.html

July 3, 2023
Micro-Services: the Forgotten History of Failures

Micro-services are one of today's leading software fashions; see this for more on software fashions in general. I have treated in depth the false claims of micro-services to make applications "scalable," and separately treated in depth the false claim that micro-services enhance programmer productivity.

Where the heck did such a crappy idea that has taken such a large mind-share in the software and management community come from?

I don't know. Who knows why skirts are short or long this year? Or whether shirts are tucked or not tucked? What I do know is that bogus fashion trends infect the software industry, some of them delivering decades of perniciousness. Looking at the spectrum of software fashions, I've noticed similarities among some of them. Some fashions well up, spread and peter out, only to recur again later with minor changes and a new name. The new name appears to be important, so that everyone is fooled into thinking the hot fashion is truly "new," not tarnished by the abject failures of its prior incarnations. I describe a clear example of an old idea with new name here.

The History of Micro-services

Why do we have micro-services? How did this bad idea get to be so popular? Was it someone's recent "great idea?" There are probably people who think it's their idea.

Do micro-services have a history? Of course they do! Nearly every major practice in software has a history of some kind. The number of software practices that are newly invented is truly tiny and going down rapidly. So why do we hear about software advances and the "new thing" so often? The whole field of software ignores its own history in truly radical ways. It even ignores what other software people in other fields and domains are doing.

Nonetheless, anyone who knows a broad range of software practices over a period of decades can easily recognize similarities and patterns — patterns that surely constitute "history" whether or not the actors recognize the precedents of what they do.

What are Services?

Let's step back and understand what "micro-services" are. By the name, it's obvious that a micro-service is an itty-bitty "service." So what's a service? Once you peel back all the layers, a service is a plain old subroutine call with a bunch of fancy, time-consuming stuff stuck in between the code that calls and the code that is called. See this for my explanation of the fundamental concept of the subroutine.

Subroutines (or functions or methods or whatever) are an essential aspect of programming. Every programming language has its syntax, statements and other things that people get all wrapped up in. Every programming language also has an associated library — as in function/subroutine library. If you look at a manual for a language and one for the associated library, the library is nearly always larger. The richness of a library is a key factor in the utility of a language. In modern terms, an associated framework serves essentially the same purpose; an example is the RAILS framework for Ruby, without which Ruby would just be one line on the ever-growing list of mostly-forgotten languages.

When you're writing a program, knowing and making great use of the associated subroutine library is essential. But what if you see that you're doing the same kind of thing over and over in your program? It makes sense to create a subroutine to perform that common function as part of the overall application you're building. As time goes on, most well-structured applications include a large portion of nested subroutines.

What if there's a program that can only run on another computer and you want to call it as a subroutine — give it some data to work on and get the results back? This problem was addressed and solved in many variations decades ago. It's most often called an RPC — Remote Procedure Call. To make a call to a distant subroutine you implement a version of it locally that looks and acts the same, but does some networking to send the fact of the call and the parameters to a cooperating routine on the distant computer. That routine gets the network call, retrieves the data, and makes a local call to the subroutine. When the return happens, the data is sent back by the same mechanism. It's just like writing a memo and putting in your out-box. The clerk who picks up the outgoing memos delivers the ones that are local, and puts the ones that aren't into an envelope, addresses it and puts the memo with return instructions into the mail. And so on. The calling code and the called code act like they normally do and the RPC makes it happen.

Along comes the internet. People notice that http is supported all over the place. Why not leverage it to implement the mechanics of the RPC? Voila! Now we have SOAP — Simple Object Access Protocol, which uses http and XML to send and return the data. A subroutine with a SOAP interface was called a "service." This all got standardized roughly 20 years ago. Then along came a simpler version to use that had even more overhead called Restful, which is still in widespread use.

If you need to call a subroutine that can't or shouldn't run on your computer but on some other computer, having some form of RPC is extremely useful. The time penalty can easily amount to a factor of thousands or even millions compared to a normal local subroutine call, but you'll gladly pay the price if there's no other way.

Now let's recall what all this amounts to. A "service" is a subroutine that has a HUGE amount of overhead. There is no reason to ever take on the overhead unless, in the vast majority of cases, the service you want to call is on a different computer than the one doing the calling, a computer that can only be accessed by some kind of networking.

Prior Incarnations of Service Insanity

There have been a couple waves of service fashion mania. One of the waves took place roughly twenty years ago during the internet bubble. People were very concerned to be able to support growth in the use of their applications by unprecedented factors. Expert opinion rapidly agreed that building a "distributed application" was the way to accomplish this. The idea was that the load on the computer would greatly exceed the capacity of even the largest, most capable computer to handle it. To anticipate this, the application should be architected to be able to run on multiple computers that could be changed at will. Instead of a "central" application, the application would be "distributed." This was further elaborated by replacing the simple RPC mechanism with queues called "service buses," as in a bus to route large numbers of service calls from one place to another. The bus itself had to be robust so that messages weren't dropped, so it evolved into an "enterprise service bus," something which exists today. A huge, complex mechanism for accomplishing this became part of java, J2EE (Java 2 Enterprise Edition). See this for the complex, reincarnating history of the transaction monitor.

It turned out that building a "distributed application" was vastly more expensive and time-consuming than just building a plain old application, the kind that today is sneeringly dismissed as being "monolithic." The computational and elapsed time running cost of such an application was also huge. Hardly any applications had loads so large that they needed to be distributed. Even worse, the few applications that ended up with huge loads that required many computers found vastly simpler, more effective ways to get the job done. It's worth noting that articles with titles like "Why were we all fooled into thinking building distributed applications was a good idea" never appeared. The whole subject simply faded away. Here's a description from another angle of the distributed computing mania.

Another incarnation of this nutty idea took place largely inside large enterprises. Many of these places had highly diverse collections of applications running the business, accumulated by acquisition, multiple departments building applications for themselves, etc. Instead of doing the sensible thing of reducing the total number of applications by evolving the best versions to be able to handle more diverse tasks, the idea of a SOA, service-oriented architecture, became the focus. All these applications could be turned into services! Instead of building new applications, people would now build services. Tools were built to manage all these new services. There were directories. There was tracking and control. All sorts of new demands came latching onto the IT budget.

SOA never got to the fever pitch of distributed applications, but it was big. It went the same way — fading as it turned out to be a lot of time and trouble with little benefit.

Now we come to the present. Remember that the original motivation of the RPC in any form is that a subroutine you want to call is ONLY running on some other computer. An RPC, whatever the form or cost, is better than nothing. Sensible then and sensible now. Then came ways of building programs so that their parts COULD BE run on different computers, if the computational requirements became huge. This also turned out to be rarely needed, and when it was needed, there were always better, faster, cheaper ways than a service approach. Now, with micro-services, we have reincarnated these proven-to-be-bad ideas, claiming that not only will micro-services effortlessly yield that wonderful virtue "scalability," but it will even make programmers more productive. Wrong and wrong.

Conclusion

I like the phrase "oldie but goodie." Sometimes it's applicable. In computing when a hot new tech trend comes along that "everyone" decides is the smart way to go, it rarely is. It most often is an "oldie but baddie" with a new name and new image. You would think that software people were so facts-oriented that they wouldn't be subject to this kind of emotionally-driven, be-part-of-the-group, get-with-the-program-man kind of thinking. But most people want to belong and want increased status. If believing in micro-services is the ticket to membership in the ranks of the software elite, a shocking (to me) number of people are all in.

March 30, 2021
Software Professionals Would Rather Be Fashionable Than Achieve 10X Productivity Gains
I’ve been involved in a large number of pioneering software efforts, and I’ve lived through many tech fashion seasons. It’s very clear that the vast majority of experienced software professionals and managers would rather follow current technology fashion than actually making things better. Because of the pathetic state of software knowledge and analytics, not only do they get away with throwing away truly massive gains for their organization, no one in power has a clue what they’ve done! This happens over and over and over and over again. A few people may know there’s a vastly better approach available – one that’s proven and risk-free – but no one hears them. The siren song of the hot new thing wins most every time.

I have been involved in many such farces in one way or another. Here’s the story of one of them from many years ago. See this and this for general background of the technologies and patterns. It's important to remember: I'm not saying that Sallie Mae was a disaster when everyone else was fine. Sallie Mae was a normal, mainstream project at the time, and therefore an excellent example of the overall insane pattern.

The Workflow project at Sallie Mae

In the early 1990’s, document imaging and workflow technology were hot; they were something people talked about the way they talk about AI/ML and innovation today.

Sallie Mae, then as now, was the government-backed student loan organization. In the early 1990’s they decided to upgrade their computer-based processes to something that was modern in order to make things more efficient. At the time they had over 10 million paper documents per year arriving at their processing centers. They had six of them at the time, each employing thousands of people. Even a 10% productivity improvement would have been big at that scale! Sallie Mae managers had read about the new document imaging and workflow technology that was rapidly growing at the time, and set up a task force of professionals to study it, create an RFP and get it implemented.

The task force knew they weren’t experts at this technology, so they sought out people to help them. Among other things, they attended the AIIM industry association’s annual convention, at which I happened to be a featured speaker due, among things, to my book they had just published. Here is the forward to the book, written by AIIM:

I had also led the technology at one of the industry’s tech vendors, had personally written their workflow software, been involved with major buyers including Amex, etc. So they approached me and asked me to help them out. I agreed. They also contracted with a small consulting firm to join in.

I dove in to analyze the situation. Sallie Mae had loads of IT people who were concerned about the new technology. The existing technology was basically IBM mainframes with many thousands of 3270 terminals on people’s desks. Implementing the new technology would involve buying expensive workstations with large image displays for each worker. The IT people worried about the LAN that would be needed to connect the workers to the image storage system, so I created a spreadsheet to enable them to calculate the transmission speeds that would be needed. While the other consultants were busily drawing up diagrams of paper movement so that workflow could be implemented, I dove into the details of what actually happened at a couple representative workstations.

I observed some people at one of the processing centers doing the work of processing the form. I timed the work and carefully observed the screens. I read the manual that was used to train the workers for the job. I arrived at a pretty radical (for the time) hypothesis: the vast majority of the work was logic-enhanced data entry. If the humans did no thinking but just entered the data, they could work 10X faster. The logic spelled out in the training manual could all be programmed on the data as entered, with a tiny number of exceptions kicked out for human handling.

If this could really be achieved, it had consequences for Sallie Mae. Instead of a massive per-employee technology investment with the work remaining roughly unchanged, a small fraction of the employees could do the same work in a completely new way. I figured I better make sure I was right before I started rattling cages and maybe embarrassing myself.

Logic-enhanced heads-down data entry at Sallie Mae

I knew that everyone assumed that implementing workflow was the goal, with productivity improvements assumed to be 30% after massive investment and disruption. This was broadly accepted, and no one involved was interested in challenging it or testing it. If there was any chance of them taking a new approach, I knew I better have the numbers.

The main things I discovered were:
- Most people entered 1,000 to 1,500 KPH (keystrokes per hour).
- Roughly 20% of those keystrokes were overhead, i.e., navigating to the next field on the mainframe screen, often navigating to a whole new screen. The mainframe screens followed how the data was organized inside the computer, not at all how it appeared on the paper forms.
- The workers weren't bad typists — they were obviously doing some thinking between fields, essentially applying the special rules they were taught in training to guide them what to enter where.
I refreshed my knowledge of practical image-enabled heads-down data entry (HDDE), See this for an explanation and details. I laid out some plans.
- Image-enabled HDDE could reasonably achieve 12,000 KPH, as it was doing at multiple high-volume operations. Moreover, the navigation overhead could be brought down to a couple percent.
- The training manual was amazingly close to a natural language description of a program. If this field on the form has that, then enter X in Y mainframe field, and so on.
- The current method interwove the entry of the form with the rules. In the new system, all the data would be entered exactly as it was on the paper form, and then a new set of software would apply the rules and enter the data into the mainframe system. When the rules changed, instead of everyone taking a training class, the software would just need to be changed.
- Some forms in the existing method were sent to a customer service person for exception handling. Much of that would still happen, but the numbers were low.
- I knew that Image character recognition could handle some fraction of the data entry work. But it was a new technology, and I figured a 10X productivity improvement and avoiding a massive technology investment for workflow was more than enough to make the case.
I wrote up my results. Since “re-engineering” was a popular concept at the time, and since “zero-based budgeting” was talked about a fair amount, I wrote a paper and called my approach “zero-based re-engineering” to explain the approach and link it to fashionable things.

My involvement with Sallie Mae ended, quickly and quietly. No one objected to my proposal. No one questioned my analysis or numbers. There were some vague comments thanking me for my creative, detailed work. But I was invited to no more meetings, got no more assignments, and that was that.

What happened was that the workflow-at-Sallie-Mae train had long since left the station. My work was nothing but a distraction, with the potential of becoming an embarrassment if things went badly. I had demonstrated that I wasn’t a “team player,” and in most organizations, there’s little worse than that.

Conclusion

Like most history, the workflow project at Sallie Mae in the 1990’s doesn’t matter at all. It’s history! Its value is that it’s a typical example of a recurring pattern in software implementation and evolution. Here is a more recent example. Sallie Mae was not unusual at the time — they were absolutely typical! Yes, they had an "insider" recommend a simple, radically better way of doing things and rejected it — but that was the overwhelming mainstream pattern. By recognizing the pattern for what it is, the few who care can learn from it, recognize the pattern when it’s taking place, and maybe even do things better.

Most organizations will almost always follow the tech fashion rather than achieve dramatic gains using proven but unfashionable technologies. Many entrepreneurs are part of the problem, since they need to sell into buyers who value fashion over real, objective, provable value. The best entrepreneurs find a way to fit in with buyers’ hunger for fashion, while also delivering real value. Sometimes they cloak the value in super-attractive, flimsy outerwear. That's OK. They’re entrepreneurs – they find a way!
October 18, 2019
Laser Disks and Workflow Illustrate the Insane, Fashion-Driven Nature of Software Evolution

Computers are so exact and numbers-based that we tend to think of the people in charge of them as white-jacketed scientists with esoteric knowledge driving towards calculated optimal results, much the same way we imagine that the scientists and engineers calculated the path of the Apollo capsule to the Moon. If the Apollo program were run the way most computer projects are, it would have been a miracle if the capsule made it off the launch pad, much less traced an incredibly exact path from Cape Canaveral to the chosen landing spot on the Moon, a journey of over a quarter million miles.

The reality is that the application of computers and software to automation is painfully slow, usually failing to achieve optimal results by large margins, and often lagging behind what is achievable by decades. The reasons appear to be a semi-random mixture of commercial interests, technology fashion trends, ignorant and risk-averse managers, and technical specialists rooted in the past and wearing blinders. That’s all!

Here’s the story of a typical example.

Microfilm, document imaging and laser disks

Long before computers, microfilm was the preferred medium of librarians and archivists to preserve paper documents for future use. Microfilm was more compact than paper, and lasted much longer. What happened when computers came along is a curious and educational story, a prime example of how computer use and software evolve.

Disclosure: I lived and worked as a participant in this history during the late 1980’s and early 1990’s. The story I will tell is not systematic or comprehensive, but like a traveler’s tale of what he did and saw.

A side show in the long evolution of computer storage is the laser disk. A laser disk has data recorded onto it either as a whole, during manufacturing, or incrementally, as part of a computer system. The laser disk had quite a run in the consumer market in the form of CD’s, and then DVD’s. They were an improvement on existing methods for distributing digital content of all kinds.

In the computer industry, they took the form of WORM (write-once-read-many) drives, and were used to record archival data that should not be updated, but could be read as often as needed. The technology, as often happens, went searching for problems it could solve. In an effort to reduce human labor more than microfilm could, jukeboxes were invented. Each jukebox had a disk reader and many shelves for disks, with a mechanism to load a selected disk into the reader. FileNet, now part of IBM, built one of the first of these systems in the mid-1980’s.

At around the same time that WORM drives became practical, paper document scanners were evolved from systems to capture images of the paper onto microfilm. In each case, light was shown onto paper and focused onto a target; a scanner replaced the film with an optical sensing device.

How could we get people to buy WORM drives and jukeboxes, the creative marketing people wondered? The departments that put documents onto microfilm weren’t going for it – the new systems were much more expensive, microfilm wasn’t accessed often enough to make it worth connecting it to computers, and the records retention people were understandably skeptical that any computer disk would be readable 100 years from now, as microfilm will be.

Adding workflow to document imaging

I have no idea who made the audacious leap, but some creative person/group came up with the idea of doing computer document scanning at the start of the process, when paper arrived at a location for processing, instead of microfilmed at the end for archival purposes. But WHY??? The creative types scratched their heads until they were losing hair rapidly. Why would anyone make responding to loan requests or whatever even longer than it is today by introducing the step of scanning?

GENIUS TIME: LET’S INVENT “WORKFLOW”!!!

Everyone has an image for what “workflow” is. For documents, it’s the path of processing steps from arrival to filing. In larger organizations, there are often multiple departments involved in processing a document, so there are in-boxes and out-boxes and clerks transporting documents from one to the other.

Rubbing their hands and cackling, the scammers worked out their strategy: We’ll scan the documents and convert them to images so that we can move images from place to place electronically instead of piles of paper! We’ll call it “document imaging workflow.” It will deliver massive benefits to the whole organization, to everyone who touches the paper, not just to the tiny group who files and archives at the end – and storing the scanned documents onto WORM drives will do their job too! Hooray! We will leap-frog the old-fashioned, paper-flooded back office into the modern electronic age. What’s not to like?

It was audacious and brilliant. It was a scam that would wither if confronted with just a little common sense or cost accounting. Which, true to the pattern of such fashion-driven trends, it never had to confront!

Implementing workflow

A typical target environment for implementing workflow was large offices with large numbers of documents, often forms of some kind, arriving every day. The forms would go from the mail room to the relevant people’s desks for processing. The person handling the form would often have an IBM 3270 terminal for interacting with a mainframe computer. When the person was done with the form, they would place it in one or more outboxes for further processing at other desks, or for filing. Sometimes documents would go into file cabinets for a period of time, but all would end up being archived, and sometimes microfilmed for permanent storage.

Implementing workflow first of all required scanning the documents as a first step, and storing them immediately on normal computer disks, and ultimately on WORM drives, thought to be the equivalent of microfilm. Once scanned, the documents would be managed by workflow software, which would route them to the appropriate desk in the appropriate department for processing. Every desk needed to have the old computer terminal replaced or augmented with a large, high-resolution image terminal, capable of handling at least the display of the document, and sometimes also the computer screen. The old, slow connections between terminals and mainframe needed to be replaced with a fast local area network, and there needed to be substantial servers for handling the local image handling and workflow routing.

Of course, lots of analysis and programming was involved in addition to the equipment. Workflows had to be analyzed and programmed, and all the management, reporting and exception handling taken care of.

The benefits of workflow

When a movement like document imaging workflow has a head of steam up, no one seems to ask whether it’s a good idea, and if so, how good of a good idea it is – quantitatively. When I was involved, everyone threw around there would be at least a 30% productivity improvement due to workflow, and that would make all the expense and disruption worth it. I never encountered a single study that measured this.

Think about it for a minute. Would looking at an image of a document make you work faster than you would looking at the original paper? Would picking the next paper from the in-box be slower than getting the system to show you the next image? What about the labor of the clerks moving stacks of paper from one person’s outbox to the next one’s inbox? It would certainly be saved, but chances are a single clerk could handle the paper movement for many dozens of people. And remember, there’s the scanning and indexing that’s been added and the massive computer and software infrastructure that has to be justified.

It’s obvious why no one EVER did a cost-benefit analysis of workflow – the back-of-the-envelope version easily shows that it’s a couple zeros away from making sense.

I remember a couple of out-of-it, tech-fashion-disaster know-nothings mildly thinking out loud about how workflow could possible be worth it financially. The immediate response from workflow fashion leaders was upping the ante – don’t you know, the wonderful workflow tool from (for example) FileNet lets you program all sorts of efficiencies for each stage of work; it’s something we really need to dive into once we get the basic thing installed. End of subject!

So who fell for this stuff? Just a who’s-who list of major organizations. Each one that fell for it increased the pressure for the rest to go for it. None of them did the numbers – they just knew that they couldn’t fall too far behind their peers.

Conclusion

It’s hard to think of a field that’s more precise and numbers-oriented than computers. Who has a clue what actually goes on inside those darn things? And the software? There are millions of lines of code; if a single character in any of those lines is wrong, the whole thing can break The impression nearly everyone has, understandably, is of a field that is impossibly precise and unforgiving of error. When the software experts in an organization say that some new technology should be adopted, sensible people just agree, particularly when there’s lots of noise and other major organizations are doing it. It can be even more gratifying to get known as pioneering, as one of the first organizations to jump on an emerging tech trend!

Somehow, the smoke and noise from the software battlefield is so intense that the reality is rarely seen or understood. The reality, as I’ve illustrated here, should result in everyone involved being sent to the blackboard by a stern teacher, and being made to write, over and over: “I pledge to try harder to rise above the level of rank stupidity next time.”

September 17, 2019
Recurring Software Fashion Nightmares
Computer software is plagued by nightmares. The nightmares vary.

Sometimes they are fundamentally sound ideas that are pursued in the wrong way, in the wrong circumstances or at the wrong time. Therefore they fail – but usually come back, sometimes with a different name or emphasis.

Sometimes they’re just plain bad ideas, but sound good and are cleverly promoted, and sound like they may be relevant to solving widely acknowledged problems. Except they just don’t work. Sometimes these fundamentally bad ideas resurface, sometimes with a new name.

Sometimes they’re a good idea for an extremely narrow problem that is wildly applied beyond its area of applicability.

The worst nightmares are the ones that slowly evolve, take hold, and become widely accepted as part of what “good programmers” believe and do. Some of these even become part of what is ludicrously called “computer science.” Foundation stones such as these, accepted and taught without proof, evidence or even serious analysis, make it clear to any objective observer that computer “science” is anything but scientific, and that computer “engineering” is a joke. If the bridges and buildings designed by mechanical engineers collapsed with anything close to the frequency that software systems malfunction, the bums and frauds in charge would have long since been thrown out. But since bad software that breaks is so widespread, and since after all it “merely” causes endless delays and trouble, but doesn’t dump cars and trucks into the river, people just accept it as how things are. Sad.

Following is a brief review of sample nightmares in each of these categories. Some of what I describe as nightmares are fervently believed by many people. Some have staked their professional lives on them. I doubt any of the true adherents will be moved by my descriptions – why should they be? It was never about evidence or proof for the faithful to start with, so why should things change now?

Sound Ideas – but not here and now
- Machine learning, AI, OR methods, most analytics.
  - These things are wonderful. I love them. When properly applied in the right way by competent people against the right problem at the right time, amazing things can be accomplished. But those simple-sounding conditions are rarely met, and as a result, most efforts to apply these amazing techniques fail. See this.
Bad ideas cleverly promoted
- Microservices.
  - Microservices are currently what modern, transformative CTO’s shove down the throats of their innocent organizations, promising great things, including wonderful productivity and an end to that horror of horrors, the “monolithic code base.” While rarely admitted, this is little but a re-incarnation of the joke of Extreme Programming from a couple decades ago, with slightly modified rhetoric. A wide variety of virtues are supposed to come from micro-services, above all scalability and productivity, but it’s all little but arm-waving. If software valued evidence even a little, micro-services would be widely accepted as the bad joke it is.
- Big Data
  - There's nothing wrong in principle with collecting data and analyzing it. But in practice, the whole big data effort is little but an attempt to apply inapplicable methods to random collections of data, hoping to achieving generic but unspecified benefits. See here.
Great ideas for a narrow problem
- Blockchain.
  - My favorite example of an excellent solution to a problem that has been applied way beyond its area of legitimate application is blockchain. Bitcoin was a brilliant solution to the problem of having a currency that works with no one in charge. How do you get people to “guard the vault” and not turn into crooks? The way the mining is designed with its incentives and distributed consensus scheme brilliantly solves the problem. However, the second you start having loads of crypto-currencies, things get weaker. Once you introduce “smart contracts,” there’s a fly in the ointment. Take away the currency and make it blockchain and you’ve got a real disaster. Then “improve” it and make it private and you’ve got yourself a full-scale contradiction in terms that is spectacularly useless. See here.
Bad ideas that are part of the Computer Science and Engineering canon
- Object-orientation.
  - Languages can have varying ways of expressed data structures and relating to them. A particular variant on this issue called “object-orientation” emerged decades ago. After enjoying a heyday of evangelism during the first internet bubble, O-O languages are taught nearly universally in academia to the exclusion of all others, and adherence to the O-O faith is widely taken to be a sign of how serious a CS person you are. For some strange reason, no one seems to mention the abject failure of O-O databases. And no one talks about serious opposing views, such as that of Linus Torvalds, the creator/leader of the linux open source movement, which powers 90% of the world's web servers. Programmers who value results have long since moved on, but academia and industry as a whole remains committed to the false god of O-O-ism.
- Most project management methods.
  - I’ve written a great deal about this, including a book. Whether it’s the modern fashion of Agile with its various add-ons such as SCRUM, project management methods, ripped from other disciplines and applied mindlessly to software programming, have been an unmitigated disaster. The response? It ranges from “you’re not doing it right,” to “you need to use cool new method X.” Yet, project management is a profession, with certifications and course on it taught in otherwise respectable schools. A true nightmare. See these.
- Most software QA methods.
  - All software professionals accept what they’ve been taught, which is that some kind of software automation is an essential part of building software that works and keeping it working. Except the methods are horrible, wasteful, full of holes and rarely make things much better. See this.
Conclusion

I make no claim that the list of items I’ve discussed here is comprehensive. There are things I could have included that I have left out. But this is a start. Furthermore, if just half the items I’ve listed were tossed and replaced with things that actually, you know, worked, much better software would be produced more quickly than it is today. I don’t call for perfection – but some progress would sure be nice!
May 7, 2019
Using Advanced Software Techniques in Business: Fashion or Real Value?
Using advanced software techniques can make a dramatic positive impact on business. It’s important for everyone to assure that your software efforts aren’t stuck in out-moded, last-generation tools and techniques.

Nearly everyone, including me, agrees with this simple statement. Nearly everyone also agrees on at least the top members of a list of reasonable candidates of “advanced software techniques” that are not “out-moded” or “last-generation.” That last statement is where the best technical people part ways with the crowd.

No, I’m not talking about hard-to-understand weird-o’s babbling about esoterica in some corner. The best technical people understand and support using the best and most appropriate techniques for solving a given problem, regardless of the recency of the technique or its prevalence. Sadly, it is often the case that the most-talked-about hot trends in software should not be used in any business that actually wants to spend money widely and get stuff done, quickly and well.

This is a BIG subject. It’s important. It’s deep. And it’s extensive. So let’s start with a big, fat, juicy example, one that was hot, hot, HOT but is now fading away, so it’s possible to talk about it somewhat more rationally. Maybe. I hope.

Big Data and Hadoop

There is little doubt that Big Data is a huge trend in software, though at least talking about it with that name appears to be undergoing a typical slow fade. Here is a review I wrote more than 5 years ago of the Big Data fashion trend as it existed at that time. It was everywhere you looked! Magazine covers! Ads! Conferences! Books! If you weren't somehow doing Big Data you were nothing and nobody.

I've been working with data my entire professional life. The data has been small, medium, large, big, huge, totally awesomely huge and even gi-normous. Since I've been faced with space and time constraints, I have long since settled on a fundamental concept of computing, simple but rarely done: Count the data! It sounds ridiculous, but it's almost a secret weapon, and appears to be rarely done. Here is some analysis of data sizes in the context of the big data trend. Here is a more detailed example of a big data set that Harvard bragged about. Hint: the data isn't very big.

I shouldn't have to say this, but here it is: For anything that people say is "big data," the very first step should be to … count the data. Sounds simple, but apparently it's not. It's also not common to dig into the data a bit. I guess it's uncommon because in most cases of data that starts out looking big after you count — this by itself is rare — is that you find that most of the data is just not needed or not relevant. Which makes it not big anymore. Which means you don't need Hadoop!

Hadoop

I know I'm being silly here. Hey, we're talking Big Data! Surely we've got some somewhere. We've got to get in an expert and crunch away so we can get those virtuous, business-enhancing juices flowing through our company.

In this situation, at least until recently, what that meant was that you dove into the next level of detail and found out that the go-to tool was Hadoop. Dig in some more, and it sounds great. It's scalable without limit. You build your Hadoop cluster, script up some calculations, tap into the ocean of data you've got somewhere, and hear about how the Hadoop spins those computers up and down, and crunches all the data, using the computers that are available, and even working around ones that fail without anyone having to respond to some old-style beeper or something. No wonder Hadoop is the go-to tool for Big Data!

In the vast majority of situations, it's "decision made" time at this point. You get in your experts, they build their Hadoop cluster and away you go, climbing the Hadoop stairway to Big Data Heaven, with a glow of virtue surrounding everyone involved.

Very few people seem to dive in and understand what Hadoop and its main programming paradigm MapReduce are all about. The Hadoop "experts" don't seem to know what the reasonable alternatives are, and when they might be applicable.

Here's an example. In 2011, one of our large web companies had a huge problem caused by Google's move to a new search algorithm. The CTO grabbed a massive web log file, wrote some code to boil the terrabytes (Big Data for sure!) of data down to the key data elements, and then loaded them into the 512GB of DRAM of his powerful laptop computer and ran some advanced machine learning against it. You can see the CTO doing the work here. A few days later he had figured out Google's algorithm, reflected it in the company's website family, and traffic increased back to nearly the pre-change norm. If he had taken the Hadoop path, he would have worked for months, spent huge amounts of money, and found that the cluster and Hadoop thing would have basically been irrelevant to the problem.

Here are a couple things to consider:
- Hadoop, by definition, spreads its computing over the many machines available to it in the cluster, using HDFS (the Hadoop file system) for reading and writing data.
  - It is literally thousands of times faster to get data from local memory than it is to get it from a disk-based file system. The fewer file reads and writes needed to perform a computation, the faster it will be. Hadoop doesn't care.
  - Using more computers in a cluster means that there will be more I/O than using fewer. Many important calculations can be performed on a single properly-configured machine!
- MapReduce, the key processing engine of Hadoop, is one of those cool-sounding ideas whose job can be done perfectly well with normal code, which can do way more, vastly more efficiently.
- Why would anyone consider such an insanely wasteful approach? Once you know the origins, it makes sense.
  - If you're a big search engine company, you have to have loads of servers, enough to hold all the data and handle all the search queries at peak traffic times.
  - As is typical in situations like this, loads of servers will be under-used a large fraction of the day. Why not write some code that sucks up these "free" cycles and put them to work? Why not build a framework so you can just specify what you want done, without worrying about what resources from what machine where is used? Who cares if it's inefficient? It gets stuff done with the computers I already have. Brilliant!
  - Now it makes sense that Hadoop started and grew at Yahoo, copying some ideas about a narrowly-applicable (MapReduce) system and framework built at Google.
  - Except that at Yahoo, they somehow decided to make the Hadoop machines dedicated! Last I heard, they were up to, get ready … 40,000 servers. Wow.
- With such an investment in getting value out of Big Data, Yahoo must be booming, just sky-rocketing with all the juice that has come out of the investment. Not. Why would anyone want to use an expensive, strange tool that generated no value for its originator? One word: fashion.
- Yes, there are some narrow situations in which Hadoop might be applicable. But in the vast majority of cases, you'll spend too much time getting way too many computers to do too little processing on not all that much data, and taking way too much time to get it done.
Conclusion

There is no doubt — none! — that you should use advanced software techniques in your business, because it will give you a competitive edge over everyone else.

The trouble is telling the difference between (1) value-adding advanced software techniques and (2) hype and software fashion. Even most software people have trouble telling the difference! In fact, software people who insist that there is a difference between value-adding software techniques and the latest thing that everyone is talking about run the serious risk of being marginalized, and categorized as being old farts who are unable or unwilling to do the work to learn the new methods.

In sharp contrast to the general thinking, software is a pre-scientific, fashion-driven field that resists holding new ideas to reasonable standards of proof and evidence. This makes it tough for business executives to know what to do. There is only one approach that works: roll up your sleeves, put ego and pride to the side, and figure it out using evidence and common sense.
January 29, 2019
What are Software Fashions?

“Fashion” is a word we associate with clothes. Software is hard, it’s objective, it’s taught in schools as “computer science.” Software can’t have anything to do with “fashion” if it’s a “science,” can it?

Sadly, software is infected by fashion trends and styles at least as much as clothes. Fashion has a huge impact on how software is built. Understanding this, along with other key concepts like those involved in Wartime Software, can contribute greatly to building great software that powers a business to great success.

Fashion

We all know what fashion is, exemplified by fashion shows like this one:

with impossibly thin female models strutting down cat walks wearing clothes that no one is likely to wear in real life.

Not as often, but guys too:

Far more important than models wearing extreme clothes is everyday fashion. I grew up looking at men dressed like this:

It's what men wore to church and to aptly-named "white-collar" businesses all the time. But there's nothing special about a suit and tie. Here's a look at Dutch fashion in the early 1600's:

Fashion is arbitrary! it's just what people wear. Everyone judges you by what you or don't wear, according to prevailing fashions.

Of course, "fashion" goes way beyond how people dress. It's how you act, how you speak, the accent you use, the interests you express, just about everything. If you don't think it matters, just trying wearing NY Yankee regalia into a South Boston sports bar and start spouting trash about the Red Sox. Lots of people who think they're the kind of people who are "above" fashion are driven by it nonetheless — just look how they respond to people walking into a room, and if there's any doubt, seconds after the newcomer opens his mouth.

Fashion is about people. It's about belonging, status, fitting in or "making a statement." We live in a fashion-dominated world, like it or not.

Fashion in Software

I was one of those people who was convinced I was "above" fashion. Being above it made me superior, in my mind, to those who were slaves to it. During and beyond college, I bought the few clothes I wore from a local used clothing store and wore hiking boots most of the time. Once when I was in my first post-college programming job, I was called out of the cubical where I spent most of my time, heads-down, programming away, and asked to come into a meeting in the front of the building. I walked in to a meeting populated entirely by men in suits. One of the men I didn't know glanced at me and immediately exclaimed "finally! We get to talk with someone who knows things!"

I had been called into a sales meeting, and one of the visitors had software questions no one knew the answer to, so "the suits" had called in the guy who knew the answers. How I dressed and acted in fact made a statement to the visitors — I dressed the way a programmer dressed, the kind of programmer who wanted to program, not one who aspired to management. So, like it or not, I was making a "fashion statement," while fooling myself thinking that I was "above" fashion. The hard fact is, no one is "above" fashion. The way we dress and act and talk, the choices we make, says loads about us. Those unavoidable choices clearly establish our place in various groups, social and status hierarchies.

Software Fashions

Given how fashion-driven our lives are, it would be shocking if programmers weren't fashion-driven in their shared activity of software. In fact, they are! Sadly, the vast majority don't think their choices are fashion-driven. They believe they're modern, with-it software professionals who are using the proven, advanced methods for doing software. The trouble is, few of them take the trouble to cast a knowledgeable, cold, hard eye on the arguments, experience and facts concerning their chosen methods and tools. They've made their choices so they can be with whatever software social group they identify with and/or aspire to. It's all about relationships and status. If you're ambitious, you may want to be with the "cool kids," members of an elect social group, yes, in software. And it works! If it didn't "work" (elevate their software group status), they wouldn't do it.

We like to think of people who wear fashion-forward clothes as being empty-headed, shallow people. Surely, programmers aren't that! But to the extent that they adapt fashion-forward software, that's exactly what they're doing — only worse! They're lying to themselves, deceiving others, and making believe they're pushing some trendy software thing because it's advanced technology, yielding results superior to the obsolete stuff that used to be the standard.

Just like with clothes, software fashions evolve. Fads start and may become hot. The fad may evolve into a fashion as it spreads, with people who haven't adopted it taking notice. The fashion may further evolve into standard practice, with eyebrows being raised for anyone who dares to question it. More often, the fashion simply fades away.

It's rare for any fad or fashion to be explicitly repudiated — oh, that was a terrible idea, people are turning away from it for good reason and here's why. No one says that! In the "advanced clothing" area, everyone knows that fashion is "just fashion." In software, fashions aren't considered "fashions;" they are considered "advances," emerging modern techniques that are objectively better than what came before, like a new drug or operation that has emerged from clinical trials and now saves lives that used to be lost! Saying that a widely adopted software fashion was never proven, was always a bad idea, but got widely used and promoted anyway would expose the game. So when software fashions die, they fade slowly away and simply stop getting used and talked about.

After a fashion fades away, it's generally forgotten by nearly everyone, usually except for a band of true believers. Some of the more intellectually heavy-weight fashions retreat to academia, where they live on, always with "exciting futures."

Some of these flourished-but-died fashions rise to live renewed lives. In one pattern, the fashion was so broadly accepted but such a failure (though rarely discussed as such) that when it becomes fashionable again, it has a new name. No one ever refers to last time, why the older fashion didn't work out, and why this slightly altered version of the same thing will. In another pattern, the fashion baldly re-emerges with exactly the same name, and nearly the same blazing-bright future as the last time. Sometimes there are even some successes. But it remains a fashion and therefore has disappointing results to anyone who cares to look, which is essentially no one — such is the social power of fashion!

Not all fashions die. Some fashions have such powerful support that they become locked in as part of modern mainstream practice, sometimes even becoming part of so-called Computer Science, or at least IT Management. I don't fully understand how and why this happens, but I know that in part, the fashions that become standards address some widely felt need in the people involved in software. When this happens, there is often a series of waves of renewal or reform — while the reformers refuse to acknowledge fundamental problems with the fashion-enshrined-as-best-practice, they latch onto some minor tweak or addition and promote it, usually with a new name, as the best way to get results with standard-practice X.

What are these software fashions exactly?

I have already talked about a few important toxic software fashions. I have gone into huge detail for a couple of them, with multiple blog posts and even books. I'm gradually starting to understand this bizarre phenomenon in terms of powerful social fashions with bad results, masquerading as "advances." I'm seeing the resistance to seeing the Emperor's New Clothes for what they are because of the self-delusional conception of software as a science/math-based STEM field, rather than as the pre-scientific collective group-think that it largely is.

I have already challenged a couple of the modern hot fashions, for example my series of posts on AI/ML — a classic example of a once-hot fashion that has died away and been re-born multiple times. This particular fashion is distinguished from some of the others because there are some truly excellent algorithms at the heart of it that can be applied to great benefit — and this has been true for decades! But because of widespread fashion-itus, the money and effort spent on them is mostly wasted.

In future posts and at least one future book (in process), I will continue to dive into and expose specific software fashions for what they are. I do this in part to strengthen the resolve of those special people and groups (some of them ones we've invested in) with the understanding that the "wrong" or "uncool" things they're doing give them a fundamental business/technical advantage, and they should stick to their guns and ride their truly effective methods to success, to the benefit of all concerned.

Further reading:

Resistance to treating scurvy compared to software disease treatments.

The modern AI/ML fashion.

The story of how I discovered the fashion vs. what-really-works issue.

Deconstructing project management.

The story of fashions-becoming-standard-practice.

The Cloud fashion.

Big Data fashion. Big Data bubble.

Evidence-based software methods don't exist.

The recurring fashion of data definition location.

November 13, 2018
The Hierarchy of Software Status
You might think that the hierarchy of status in software closely tracks the hierarchy of software skills, which I explained here. Hah! It is true that there is a small but important subset of people whose internal sense of status tracks the skills hierarchy reasonably well. But not for most programmers. The reason is simple: non-programmers' assessment of software status is unrelated to skill! It's a whole different hierarchy that has nothing to do with the ability to conceive and write good code that works!

Here is an earlier attempt to explain this issue.

This is NOT just an "academic" subject, BTW. It is absolutely crucial to getting good software done. See this for another angle at addressing the same subject.

Status

Status is part of human existence. There has always been a status hierarchy. Not long ago, we had lords and peasants in which the status differences were overt:

Status itself has barely changed since those days, though the clothing and other means of expressing it has changed a great deal.

Status is expressed in clothing, language, money, power relations, human interactions and in much of what we do. It would be shocking if generic human status relations did not apply to software. What's interesting is the translations that are made, which track medieval lord-and-peasant relationships quite well.

Status in Software

The easiest way to explain how status works in software is to look at how "close" you are (along multiple dimensions) to real people using code today. The closer you are to actual code and/or people using code, the lower your status; the farther away, the higher your status.
- One dimension is time. The highest status of all is enjoyed by people who think great thoughts about how some software might be built by some undefined group at some undefined future.
- One dimension is management layer. In any group, the "lowest" person in terms of management levels has the lowest status, the manager of the worker's manager has twice-higher status.
- One dimension is closeness to a real user. Anyone working in any way on code that isn't yet used by anyone has greater status than anyone working on code that is in actual use.
- One dimension is tied to the flow of interactions with users. If you work in any way on the flow of code or changes to code that is on its way to users (i.e., to production), you have higher status than anyone working in any way on the flow of information and communications that flow from users to the supplier or deliverer of the software. In other words, building software is higher status than customer service.
- One dimension is trendiness. It's really important to identify yourself with tech trends that are exploding in fashionable talk.
  - If you're too early, you lose status. When someone hears you babble on about something, if they check it out on conferences, publications and the web and it's obscure, you seriously lose status.
  - If you're too late, you lose status — you're pegged as someone who's part of the crowd. Better late than never, of course. You also lose status if you keep bringing up something that's starting to fade away.
  - Maximum status is early-peak attention.
  - You can't actually know anything or do anything in the fashionable area and have any chance of maintaining status. You've got to direct, hire, establish, prioritize or strategize. Those all keep your hands safely clean and enhance status.
- A dimension that becomes crucially important during the hiring process, and which often persists long after the hiring date, is the visibility and perceived success of the person's prior company. The more visible and successful, the higher the status.
  - No one knows whether the person had positive impact on the company's success or the opposite. The person from Google has a halo, regardless of what they did.
  - This is a subject I treat in considerable detail in the Software People book, along with other important hiring considerations.
Naturally, there are nuances.
- For the people who deal with the flow of data back from users:
  - There are people who respond to customer complaints and problems. Super-low status.
    
    Some of those customer issues are real bugs in the software! The very lowest status goes to the people whose job is to patiently educate the stupid users whose inability to do the simplest things with such a wonderful application is obvious to everyone except themselves. The people who get called in to see if there really might be a bug have higher status.
    
    The people who fix such bugs have even higher status, but still remarkably low.
  - There are people who analyze conversion rates and details of customer use of the application.
    
    All these have higher status than anyone involved with customer complaints or bugs.
    
    If they deal with things like conversion rates, that's related to high-status strategy and marketing and therefore higher status than anyone whose job is to tweak the application to make it better for users.
There's another important aspect of status. I suppose I could jam this into the "dimensions" list, but I think it's simpler to say this other important thing about status this way: the more code you write, the lower your status. Writing no code is, with some important exceptions and subject to the general dimensions above, higher status than writing any amount of code.

Yes, writing code requires highly specialized knowledge. But so do lots of other low-status things, like repairing garage door openers or installing lawn sprinkler systems. The people who manage the repair people are, of course, higher status than the people who get dirty or use their muscles doing physical work. Similarly, the people who metaphorically dirty their minds while doing demanding, hard mental work are lower status than those who direct them in any way.

Another way of thinking about this is asking, what's your main focus? To the extent that it's a body of code, your status is lower. To the extent that it's people, your status is higher, with some exceptions, see customer service above.

Like anything else, there are exceptions. I have met highly productive coders who are older and well-regarded. But they're rare, and usually they're tucked away in a corner if they're tolerated at all.

Conclusion

The way status is perceived in software is perverse, and accounts for a great deal of the dysfunction we see in the software world. It is sad. Even sadder is that, with all the fashion trends that sweep through the tech world, none of them addresses this issue. People with high software skills often perceive this and shake their heads, knowing there is nothing they can do about it.

There are, of course, tiny groups of people where status is conferred as it should be: to the people who can conceive and then build their way to victory, rapidly and effectively. It's from those warriors of the abstract world that I learned the principles I discuss in my various books related to wartime software. There is more information and context on this and related subjects in my book on Software People.
October 2, 2018
Evidence-based Software
Have you heard of "evidence-based medicine?" It's a relatively new trend in medicine based on the idea that what doctors do should be based on the evidence of what works and what doesn't. What's scary as a patient is the thought that this is a new idea. What is it replacing? Voo-doo-based medicine?

At least the field of medicine has accepted that evidence matters. So much better than not!

Let's turn to software. Have you ever heard of evidence-based software? Of course not! There is no such thing! How software is built is based on loads of things, but sadly, evidence is not among them. Among other things, this explains why software projects fail, and/or result in expensive, rigid bloat-ware that is riddled with errors and security holes.

The Golden Globes 2016

One of the reasons to watch the Golden Globes awards ceremony is for the fashion. Everyone knows it — which is why there's a multi-hour Red Carpet pre-show, and even a pre-show to the red carpet show.

You watch the show if you want to see what the new fashions are. You wouldn't want to look silly, would you? If you watched this year's show, you could see Amanda Peet looking really nice:

And you could see Sarah Jessica Parker looking like something else altogether:

I heard the expert on one of the shows talking about the new colors and lines in the dresses, something we'd see more of in the upcoming year.

What's the "best" fashion? The one leading people seem to like. What will be the best fashion next year? About all you can be certain about is that it will be something different from what was most-liked this year.

Software development fashions

The methods used in software development are selected with just about the same criteria as the leading fashions in dresses. Who's wearing what? What do leading people think? What did I use (wear) last time that got admiring looks?

Fashions come into software development. They get promoted. They get used and mis-used, adapted and morphed. Programmers take them with varying degrees of seriousness. Wherever you're programming, you have to more or less go along with the prevailing fashion. If everyone else crosses themselves, you'd better too. If there's a daily stand-up, you'd better stand up when everyone else does, and not look too abstracted or bored.

Effectiveness, Productivity and Quality

In fashion, you want the admiration of other people who look at what you're wearing. In software, since you spend most of your time building software, it's nice to have the admiration of people who look at you building software.

But unfortunately, other points of view sometime intrude. Managers want to know about budgets and productivity and deadlines. After the software is put into use, there are ignorant and annoying users to contend with. What you've worked so hard to build is never enough. They complain about it! Crashes, performance, quality issues? Sometimes people get upset. And security? Rule number one is keep it quiet! The last thing we need is this getting into the papers!

Then you find out that most outsiders could care less what goes on in the sausage factory. Whether it's organized or chaotic, ugly or pretty, in the end all they seem to care about is how the sausage tastes. These simple-minded people can only keep one thing in their heads at a time, and that one thing is most often: the results!

Wouldn't it be nice if we had a way of picking through the dozens of software methods that are in widespread use, and based on the evidence, settle on just a couple that were the best that actually … produced the best results!!?? Or maybe that's just too radical a thought.

That's why we need something like evidence-based software — or at least acknowledgement that it could help things out.

Coda: EBSE: Evidence-Based Software Engineering

I started writing this blog post based on the comparison to evidence-based medicine as a way to frame the fashion-based chaos that surprisingly rules the day in this highly exacting field of work. I certainly had never heard the phrase "evidence-based software." But as a check before clicking "publish," I thought I'd better do a quick search. Imagine my surprise when I found that there is, indeed, something called EBSE, evidence-based software engineering, explicitly inspired by the analogy in medicine!

I've interacted with a large number of software engineering groups over the last twenty-plus years, and been inside a few for many years prior to that. The groups have been highly varied and diverse, to put it mildly. I've seen loads of trends, languages, methodologies and tools. And never — not once! — have I heard of the existence of EBSE. It should be just what we need, right?

So I dove in. It's sad. Or pathetic. Both, actually.

There's a moribund website on the subject:
- It doesn't have a domain name, it's just hosted at some obscure university in the UK midlands.
- The last "news" is from 2011. Not much happenin'…
- All the "evidence" appears to come from published academic papers — you know, those things that practicing software people absolutely depend on.
- "The core tool of the evidence-based paradigm is the Systematic Literature Review (SLR)…" The SLR is basically a meta-analysis of lots of published academic papers. Whoopee!
- The whole thing is organized "using the knowledge areas defined by the SWEBOK."
- I couldn't find a single useful thing in the whole pile of words.
The "SWEBOK"??? Another thing I've never heard of. It turns out it's an official IEEE guide to the Software Engineering Body of Knowledge. This essential guide tells us everything that leading academics are convinced are must-knows, "generally accepted knowledge about software engineering." If only I had known! Think how much trouble I could have saved myself and others over the years! Best of all, it's practically up-to-date — just over 12 years old!

EBSE and SWEBOK are great demonstrations of just how bad things are in the software field: even when you start with a great metaphor, you still make no progress if you continue to accept as gospel the broken assumptions that the field's academics take to be eternal TRUTH. The sad fact is, math and computer science are at fundamental odds with effective software development. As I've shown. Sad, but true.

Having something like evidence-based medicine for software instead of the ugly, ineffective chaos we have today would be nice. EBSE is a nice name, but as a reality, a non-starter.
February 14, 2017
Organizing for Successful Innovation: Recent History
One of today's hottest trends is fostering innovation. It's real important! There are books, conferences, certified experts and all sorts of things. Let's do two things: (1) look at the origins of today's acknowledged tech leaders; and (2) see how those tech leaders innovate today.

The origins of today's tech leaders

What would we find if we looked at the origins of some important organizations that took the market by storm, grew rapidly and became part of the modern landscape? Did they come from people following the popular methods for fostering innovation? Let's look at some big, successful tech companies, and find out how they got started. There are two possibilities:
1. It came out of some large organization that followed modern innovation methods. Or its founders were avid readers of books on innovation, certified innovation trainers, attendees of innovation conferences, or otherwise showed that they were nurtured by innovation thinking.
2. It was started by one or two people who set out on a mission without any of #1 and kept marching forward until they got it done, perhaps with the help of VC's (venture capitalists).
I've already discussed the cases of Microsoft, Facebook and Oracle here. Their founders not only lacked training in "innovation," they were all college drop-outs! How could they have possibly founded three of the largest, most successful and valuable tech organizations? Must have been luck, I guess.

Maybe they're the exceptions! Maybe most of the rest fit comfortably into the mold of #1! Let's look at a few:
- Apple. Jobs and Wozniak. College drop-outs.
- Amazon. Bezos. Princeton grad, worked in finance, hedge funds. Jumped on a vision.
- Dell. Michael Dell. Started in his dorm room, dropped out.
- Google. Stanford grad students, dropped out. VC backing.
Why all the college drop-outs? In each case, a founder who was already obsessively good at something saw a related opportunity (sometimes with a buddy), dove in to make sure he didn't miss the opportunity, and made it more important than anything else. Including "education."

A pattern seems to be emerging here. It's not looking good for theory #1.

How today's tech leaders foster innovation

Now that these companies are big and successful, surely they are great at innovation, right? They must have certified innovation experts just crawling the halls, and a good fraction of the staff out attending innovation conferences, right? They must be just cranking out innovations left and right!

They try, in various ways. But it turns out that these great companies aren't any better than any other large organization at innovating — and you can see it by all the acquisitions they do! They have HUGE resources — how could some scrappy bunch of nobodies possibly come up with something they couldn't invent themselves?

Well it happens. It's happening right now, in AI. Look at who's doing the acquiring:

Facebook alone has made more than 50 acquisitions, most of them since 2010.

How about Google, who are supposed to be the smartest and most innovative of all? They've acquired more than 180 companies. Someone figured out the ten most expensive buys:

And the grand total is …

Now, that's innovation for you. Just ask your boss for a multi-billion dollar budget, and you'll be able to innovate like Google!

Until then, remember, there really is a better way to approach making your company better, even if it is unlikely to win any awards for "innovation."
May 3, 2016
The Innovation Bubble

We're in the middle of an innovation bubble. "Innovation" is hot. Innovation "experts" are coming out of the woodwork. Everyone wants some innovation. If you're not "fostering innovation" you're hopelessly outdated.

Some people say you need teams for innovation to take place. As a responsible manager, it's your job to foster innovation in your teams. Here's one of the books you can read about how to do it:

"Leadership" is a timeless favorite. Naturally, if you're an excellent leader, part of what you do is make innovation happen. Here's one of the books you can read about it:

Maybe you're not a big reader, but want to make sure innovation happens anyway. You can go to conferences, world-wide ones even:

What if you really want to get serious about innovation; how do you do it, after all? The good news is, it's not at all mysterious! You can go to school and learn all about innovation:

If you're an executive, you can take some time off and learn about it too.

Maybe you're having trouble breaking through. It could be that the missing ingredient is certification. Hmm, let's Google it, and see if any places can help out:

After all, you can say you're a great innovator, but without an official certification, why should anyone believe you?

Lots of people are doing it, even people with MBA's:

If you're really good, maybe you can become a CIO — no, not a chief Information officer, a chief INNOVATION officer. And then, if you're really good, maybe you can attend a round table:

What if you want help fostering innovation inside your organization? There are prestigious places just waiting to help you out — it's their business, and you may not be doing it the right way:

It's probably too late, but it would be a shame not to at least try to catch up with this innovation thing. An awful lot of people seem to think it's the best thing ever.

Of course, there are a few lonely voices saying this innovation thing is just a fad. They seem to say that if you want real innovation, there is a simple way to get it. They're probably just whiners who ignore the experts, and don't want to take the trouble to read a book, attend a conference or get certified. Too bad for them.

April 26, 2016
Innovations that aren’t: data definitions inside or outside the program

There are decades-long trends and cycles in software that explain a great deal of what goes on in software innovation – including repeated “innovations” that are actually more like “in-old-vations,” and repeated eruptions of re-cycled “innovations” that are actually retreating to bad old ways of doing things. Of the many examples illustrating these trends and cycles, I’ll focus on one of the most persistent and amusing: the cycle of the relationship between data storage definitions and program definitions. This particular cycle started in the 1950’s, and is still going strong in 2015!

The relationship between data in programs and data in storage

Data is found in two basic places in the world of computing: in “persistent storage” (in a database and/or on disk), where it’s mostly “at rest;” and in “memory” (generally in RAM, the same place where the program is, in labelled variables for use in program statements), where it’s “in use.” The world of programming has gone around and around about what is the best relationship between these two things.

The data-program relationship spectrum

At one end of the spectrum, there is a single definition that serves both purposes. The programming language defines the data generically, and has commands to manipulate the data, and other commands to get it from storage and put it back.
At the other end of the spectrum, there is a set of software used to define and manipulate persistent storage (for example a DBMS with its DDL and DML), and a portion of the programming language used for defining “local” variables and collections of variables (called various things, including “objects” and “structures.” At this far end of the spectrum, there are a variety of means used to define the relationship between the two types of data (in-storage and in-program) and how data is moved from one place to the other.

For convenience, I’m going to call the end of the spectrum in which persistent data and data-in-program are as far as possible from each other the “left” end of the spectrum. The “right” end of the spectrum is the one in which data for both purposes is defined in a unified way.

People at the left end of the spectrum tend to be passionate about their choice and driven by a sense of ideological purity. People at the right end of the spectrum tend to be practical and results-oriented, focused on delivering end-user results.

Recent example: Ruby and RAILS

The object-oriented movement tends, for the most part, to be at the left end of the spectrum. Object-oriented people focus on classes and objects, which are entirely in-program representations of data. The problem of “persisting” those objects is an annoying detail that the imperfections of current computing environments make necessary. They solve this problem by various means; one of the most popular solutions is using an ORM (object-relational mapper), which does the work of moving objects between a DBMS and where they “should” be nearly automatic. It also can automate the process of creating effective but really ugly schemas for the data in the DBMS.

A nice new object-oriented language, Ruby, appeared in 1995. It was invented to bring a level of O-O purity to shell scripting that alternatives like PERL lacked. About 10 years later, a guy was working in Ruby for Basecamp and realized that the framework he'd created for it was really valuable: it enabled him to build and modify things his company needed really quickly. He released it to open source and became known as the RAILS framework for Ruby, the result being Ruby on RAILS. Ruby on RAILS was quickly adopted by results-oriented developers, and is the tool used by more than half a million web sites. While "pure" Ruby resides firmly at the left end of the spectrum, the main characteristic of the RAILS framework is that you define variables that serve for both program access and storage, putting it near the right end of the spectrum.

Rhetoric of the ends of the spectrum

Left-end believers tend to think of people at the other end as rank amateurs. Left-enders tend to claim the cloak of computer science for their style of work, and claim that they're serious professionals. They stress the importance of having separation of concerns and layers in software.

Right-end practitioners tend to think of people at the other end as purists, ivory-tower idealists who put theory ahead of practice, and who put abstract concerns ahead of practical results. They stress the importance of achieving business results with high productivity and quick turn-around.

It goes in cycles!

There have always been examples of programming at both ends of the spectrum; neither end disappears. But what happens is that the pendulum tends to swing from one end to the other, without end up to now.

In 1958, Algol was invented by a committee of computer scientists. Algol 60 was purely algorithmic — it only dealt with data in memory, and didn't concern itself with getting data or putting it back. Its backers liked its "syntactic and semantic purity." It was clearly left-end oriented. But then practical business people, in part building on earlier efforts and in part inspired by the need to deliver results, agreed on COBOL 60, which fully incorporated data definitions that were used both for interacting with storage and for performing operations. It was clearly right-end oriented. All the big computer manufacturers, anxious to sell machines to business users, built COBOL compilers. Meanwhile, Algol became the choice for expressing algorithms in academic journals and text books.

There was a explosion of interest in right-end languages during the heyday of minicomputers, with languages like PICK growing in use, and niche languages like MUMPS having their day. The rise of the DBMS presented new problems and opportunities. After a period of purism, in which programmers struggled with using left-end languages that were supposed to be "easy" like BASIC to get jobs done, along came Powersoft's Powerbuilder, which dominated client-server computing because of the productivity win of integrating the DBMS schema with the language. The pendulum swung back with the rise of the internet and the emergence of "pure" java, with its arms-length relationship to persistence, putting it firmly in the left-end camp. Since Java wasn't pure enough for some people, "pure" Ruby got out there and grows — and some Ruby user is under pressure to deliver results and invents RAILS, which then spreads to similarly-minded programmers. But "amateurs" who like Java but also like productivity invented and are spreading Grails, a derivative of Java and RAILS that combines advantages of each.

So is RAILS an innovation? Java? I don't think so. They're just minor variants of a recurring pattern in programming language design, fueled by the intellectual and emotional habits of groups of programmers, some of whom push towards the left end of the spectrum (with disdain for the amateurs) and others of whom push towards the right end of the spectrum (with disdain for the purists).

Conclusion

This spectrum of the relation between data and program has persisted for more than 50 years, and shows no signs of weakening. Every person or group who, finding themselves in an environment which emphasizes the end of the spectrum they find distasteful, tries to get to the end they like. Sometimes they write a bunch of code, and everyone thinks they're innovative. Except it would be more accurate to say they're "inno-old-ive." This is one of the many patterns to be found in software history, patterns that account for so much of the detail that is otherwise hard to explain.

June 30, 2015
Storage Vendors in the Cloud

When computer vendors encounter a major technology disruption, they respond the same way, with fervent claims that their products are really well suited for the new environment, when of course they are not. The response of storage vendors to the new ground-rules of the Cloud provide a timely illustration of this near-universal phenomenon.

Our Product is Definitely in Fashion

Computers
are complicated. Many people have trouble just keeping the buzzwords in
mind, much less understanding what, if anything, is behind them — much
less actually understanding things. It's particularly tough when a wave
of fashion sweeps the industry, as it so often seems to. Then everyone
but everyone immediately claims to be at the forefront of whatever that
fashion is.

This
was true years ago when the good thing to be in databases was
"relational," and suddenly every database vendor revealed that their
precious products were, in fact, "relational." At first I laughed. What
idiots these marketing people were — why anyone can tell that C's
product wasn't relational when it was built, isn't now, and probably
never will be. What a joke!

It turns out the joke was on me. Whatever the buzz-fashion-word of the moment, Industry-standard practice is to claim it. And for most people to accept the claim!

This is a big deal for the established vendors. There is a lot
of money riding on maintaining market share as the new trend takes
hold. When "relational" becomes the hot thing, and your marketing people
are any good at all, then by golly, our database is relational — because I say it is!

The Cloud — the Buzz-Fashion-Word of the Moment

Now
the Cloud is hot. Surprise, surprise — everyone's product claims to be
"cloud-ready," "Cloud-optimized" or whatever it is they think you want
to hear.

Everyone's product is just great for the Cloud. The major vendors:

and everyone else.

Inside the Marketing Department

Something like the following dialog probably happens inside each major vendor.

Bright New Kid: "I'm having real trouble producing that marketing piece about our products for the Cloud. I've read a lot about Cloud, and we just don't fit. I don't know what to do!"

Seasoned Veteran: "You're making it too hard. We make storage, right? Our storage is great, right? Cloud needs storage, just like everything else, right? So our storage is ideal for the Cloud. That's it!"

Bright New Kid: "I'm not so sure –"

Seasoned Veteran: "You're over-thinking it, kid. Our storage is great, so it's great for Cloud. Just get over yourself and write it."

What's Different about the Cloud?

There
is no cloud industry association to certify what the criteria are for
cloud appropriate. This is just as well, because the cloud is just another name for something we already do — run data centers.

But the reality is that things are different in the cloud.

The
bottom line is simple — it's the bottom line! Literally! Meaning, the
cloud is all about making things faster to implement and change; better
performing and more responsive; and less expensive. I make no secret of my preference here. But the point and my analysis would be the same even if I had no horse in the race. It's not about feature X or service Y, all of which are irrelevant or migrating up the stack in Cloud applications. It's about the bottom line, not just purchase price, but TCO.

The vast majority of data centers have been run essentially without competition. The people who pay the bills haven't been able to choose. It's the in-house data center or nothing.

With the Cloud, suddenly there's competition. Buyers compare on price and quality — and can even switch if the promises prove to be hollow ones! So things are different in the Cloud. The arm-waving is replaced by the simple measures of capacity, performance, energy and space utilization, management costs, and maintenance.

April 1, 2013
The Big Data Technology Fashion
Where there are people, there are fashions. Why should technology be immune? The current fashion of "big data" is a classic exemplar of the species.

The Books

Books are a good place to observe the common themes of technology fashions. You'll see patterns that resemble the ones I previously pointed out for project management.

I think it's fair to say it's not a legitimate technology trend if it's not covered in an "X for Dummies" book.

Similarly, it's got to be big. Be Revolutionary. Transform lots of stuff.

Its got to be a big, scary thing that needs taming.

For any fashion trend, its important to make sure that other things are hitched to its wagons.

Let's not forget that, if it's worth paying attention to, there's got to be a way to make money from it.

It's never too soon to start adding layers of process and paranoia to it, to assure that costs skyrocket and that hardly anything ever gets done; in other words, governance.

Finally, anything but anything has to have a human side.

I swear, I sometimes think there's a central planning committee for technology fashions. They plan when the next new label on something old and not all that interesting is going to come out, grab their standard set of titles, and pass them out to people to write the books.

But then, I guess it can't really be that organized, because there are usually so very many books, each of them covering the same small set of themes over and over and over, with slightly different language. The themes always seems to include:
- X is revolutionary; it will change lots of important stuff.
- X is big and scary, and you need help to tame it or bring it under control
- There are lots of ways to screw up doing X, so you need to pay lots of money for Y to get it right
- You're a Dummy, but I'll help you understand what you need to know about X anyway.
- X has a human side
The Conferences

Things aren't that different with conferences. They take the themes established in the books and embellish them a bit.

There are conferences for people who work in particular sectors.

You can't pass up an opportunity to learn from the very best.

Who can resist going to a conference which cuts through all the crap and helps you do stuff?

Anyway, you get the idea — there are lots of conferences. The themes are predictable, even without the aid of big data or predictive analytics. Because they apply to any technology fashion trend.

Conclusion

Technology fashions — they are forever in fashion!
February 13, 2013
“Big Data:” Some Little Observations
"Big Data" is everywhere. If only because of this, it is important, like the way Paris Hilton

is famous for being famous.

What's included in "Big Data?"

If your concern is storing, serving or transmitting it, you don't care what kind of data it is — data is data, a pile of bits.

But not all data is created equal. The easiest way to understand this is to break all the bits into relevant buckets. By far, the largest bucket is for image data (including both still and moving pictures, videos). While the ratios vary, it's not unusual for there to be 100 bits of image data for each bit of other data.

While there's not a commonly accepted terminology, all the rest of the data can be understood as "coded" data. This again falls into two categories. The larger portion is "unstructured" data, things like documents, blogs, e-mails and most web pages (except for the images and videos on them). The smaller portion is "structured" data, which includes all databases, forms and anything else that can show up in a report.

When people talk about "big data," they could be talking about any of the above, but mostly people talk about it because they want to extract actionable information from it, and the source of most actionable information is structured data. So in the vast majority of cases, when people talk about "big data," they're talking about structured data.

Did Data used to be Small and Now it's Big?

Think about a bank statement. There's a little information about you at the top, but most of the statement is probably taken up by the transactions — money moving into and out of the account. In general terms, this is the action log, the transaction history. This pattern of having an account master and detail records is a common one.

Now think about a web site. The site itself is like the bank statement, and the record of people visiting and intereacting with it is like the transaction history, generally known as a web log.

People generate far more transaction records when interacting with
the web than other human activities; for example, you probably click on
hundreds of pages for each bank transaction you make. So the amount of data can be pretty big.

The simple answer is: before the web, transaction data wasn't very big, and with the web, there's a lot more of it than there was before. Of course big data isn't just about the web; but the web has certainly gotten people to pay attention.

So where did "Big Data" come from?

It would be interesting to do a cultural history, but I suspect that the current interest in "big data" stems from the following factors:
- Companies that pay attention to web logs get information about visitor behavior that can be used to make more money.
- Internet advertising companies have done exactly this for years, and are getting really good at it.
- Shockingly, most people don't analyze their data to improve their behaviors.
- A closed loop system in which the results of your actions are used to enhance future actions is the clear winning strategy.
- This requires (gulp) collecting and analyzing the relevant data, which is far larger than most people are used to dealing with.
Thus the term "big data," which currently applies to just about any body of transaction data.

What's "Big" about "Big Data?"

Let's start by applying one of the fundamental concepts of computing to the question: counting. One of the first disk drives I got to use was a twelve inch removable pack developed by IBM:

Its capacity was about 1MB. While that may sound small by today's standards, let's put it in perspective. Each byte is the equivalent of a character that you can type. Using a generous measure of 30 wpm and 5 cpw, that's 9,000 characters in an hour of continuous typing with no breaks, so the disk above has a capacity of more than 100 hours of continous typing. That's one reason I thought the disk's capacity was huge — it easily held the source code for the FORTRAN compiler I wrote at the time, which was about a year's worth of work!

Now let's get modern. Drives have gotten smaller while holding more and more. Here's a good visualization of the progression:

We're now at the point where truly small drives (1 to 2.5 inches) hold massive amounts of data; 1TB or more is common.

How much is that? Remember, it would take 100 hours of continuous typing to fill up the large disk pictured earlier. How much space would those drives fill if you had 1TB to store? That's about 1 million of the older disks; if you packed them tightly, they would fill a room that was about 100 feet long, 100 feet wide and 10 feet high. And I would have to type for 100 million continuous hours to fill them up. Now, that's big data.

Now that we've got a sense of how big a TB is, let's get real.

On a good day, this blog might have 100 page views, each generating a server log record. Such records vary in length, but let's say they average 100 bytes in length each, or 10K bytes a day. Not much.

Let's say I caught up to the Washington Post, a site which is in the top 100 in the US. It gets about 1 million page views a day. That would be a mighty 100 million bytes a day of raw server log data. 10 days would add up to a GB of data, which means that ten thousand days, about 30 year's worth of data would fit on one of those physically little drives pictured above that holds just 1TB of data.

The Washington Post is a major site; top 100. Their web transaction logs are the biggest data for analysis they've got. And here's what 30 year's worth of their data will fit on:

That's what they call "big data." This is why I instinctively drop into cynical mode when the subject of "big data" comes up. It just isn't usually very big!

How much data do you need?

It depends on context. If you're a website like Facebook offering a free service holding user's data, the answer is simple: you keep as much of the user's data as you feel like. You can (and if you're Facebook, regularly do) throw out data any time you feel like it, or just drop it on the floor and lose it because your programmers weren't up to dealing with it.

If you're a money-making business that depends on data, you could probably run your business better if you
1. Kept all the data
2. Analyzed it
3. Came up with useful observations, and
4. Changed your behaviors accordingly.
But most businesses don't do this very well, if at all. And they are feeling increasingly guilty about it. Thus the marketing drum-beat for selling everything that can possibly be labelled "big data."

Sarcasm aside, the fact is that most businesses don't need much data in order to perform wonderfully useful analyses. The reasons are simple:
- The things that matter the most are things you're not doing yet. The data you've got is historic. It's like if you're a comedian and the audience doesn't laugh much; no amount of big data analysis of audience reaction will help you come up with better laughs.
- The impact of big potential changes will be seen in lots of your data. Go back to statistics 1.01. How much data do you need to see that the coin you're flipping isn't a fair one? Only enough to prove that the 2 out of 3 times it comes up heads isn't a fluke.
- In the end, how many changes can you realistically make? Hundreds? How about rank ordering them, finding the most important ones first, then moving on from there? You'll quickly get to diminishing returns.
Finally, more important than anything else, is getting into an experimental, data-driven, closed-loop system. This is always the key to success. It how organizations become successful, get more successful, recover from trouble, and stay on a winning path.

Conclusion

For better or worse, "big data" is likely to be with us for awhile, at least as a technology fashion trend. Like all such fashion trends, it's a useful occasion for getting us all to check if we're putting our transaction data to its most optimal use in keeping us on the track we're on and getting us onto improved ones.
February 4, 2013
The Cloud and Virtualization

I've long been impressed by the cancerous power technical terms have when they become fashionable. "The Cloud" and "Virtualization" appear to be setting new records for meaningless ubiquity in a crowded field.

Setting new Records

When terms like Cloud and Virtualization appear in Dilbert, it's like getting a "Best of Reality TV" award: they've made it to the rarified elite of complete vacuousness.

Technical Fashion

People throw around buzzwords all the time when talking about technology. It's necessary and unavoidable.

By some process that I understand as well as I understand clothing styles, some buzzwords descend below the level of providing a convenient short-hand for some well-understood thing and enter the hell of fashion. Starting life as useful buzzwords, they morph into buzz-fashion-words, lose any meaning they had, and become obsessions for everyone who feels they need to be touch with the latest stuff.

How the Devil Does his Work

There is usually a marketing side of Buzz-fashion-words. Perfectly useful terms don't turn to the dark side without lots of help. I would love to be able to channel C.S. Lewis and write a Screwtape Letters for Buzz-fashion-words, exposing the process of commercial interest and personal technical insecurity that leads innocent, useful words like "Virtualization" to be the "it" thing.

The Cloud

I've already talked about "The Cloud," the hottest thing in corporate data centers with the possible exceptions, of course, of "virtualization" and "Big Data."

The Cloud is an interesting case of a buzz-fashion-word that is like a magician's misdirection technique. The consumer internet has long since adopted a set of flexible, cost-effective techniques for delivering services to their customers. The people who work in corporations are also consumers, and are exposed to the ease and convenience of these techniques. They notice that consumer internet services follow different rules than the ones imposed by corporate IT, for example about the release process. They naturally wonder why they can't have the same. Over time, the wonder moves to requests which finally become demands. Corporate IT, years late, feels they have to respond. The result: "we are migrating our corporate data center to The Cloud." The Cloud, in this case, means "our name for what the consumer internet has done for years, but by giving a catchy, trendy name, we hope we'll trick you into thinking we're ahead of the curve instead of pathetically clumsy and slow." A classic case of misdirection by people who are anything but magicians.

Virtualization

Everyone knows what virtualization is. When I'm standing in front of you, I see the physical you. When I look at a picture of you, I see the virtual you. It may look real, but it's not. The latest in 3D movie technology makes such virtual reality more effective.

Computers are all about virtualization; computers are layers and layers of virtualization. The machine language of a computer is itself a phantom creation; the languages we write in all use a "virtual computer." The inexpensive part of computer storage, the physical devices, have layers of virtualization on the inside, and present an interface that the upper layers that use it like to think of as being "physical." In computing, virtualization is everywhere, and always has been.

In that context, what is this hot new trend, "Virtualization," that often appears together with "The Cloud?" I've gotten bored with torturing the people who use it by asking what they mean. Sometimes they mean a software product (for example, VMWare) or a Storage Area Network (virtualized storage) but they're trying to be fancier. Sometimes they mean "you know, that thing that everyone agrees is a good thing." You get the idea.

Technical Fashion Words

I've got to grow up. I've got to exercise discipline. Launching into lectures when buzz-fashion-words are slipped into conversation by innocent, otherwise intelligent people does no good. There is nothing to be gained by snarkiness.

I've decided to focus on the word "nice," a completely vacuous word signifying a vague, positive attitude.

From now on, when I hear a term like "the Cloud," I will translate it mentally to "nice," so that I too can smile and think positive thoughts. "We're migrating our data center to the Cloud" equals "We're making our data center nice." "We are experimenting with virtualization" equals "We're trying to be nicer and it's working well."

What's not to like? Isn't that nice?

May 25, 2012
I’m Tired of Hearing about Big Data

I've already complained about the so-called Cloud, and I thought that would get it out of my system, but I've had it — I'm up to here — with blankety-blank "big data."

"Big Data" at the Harvard Library

The Harvard Library system is REALLY BIG. Here's Widener Library:
Widener has 57 miles of bookshelves with over 3 million volumes. But that's just the start. The Harvard University Library is the largest university library system in the world. There are more than 70 libraries in the system beyond Widener, holding a total of over 12 million books, maps, manuscripts, etc.

Here's the news: Harvard is putting it on-line! Well, not the actual books; there are little details like copyrights. But the metadata, about 100 attributes per object. According to David Weinberger, co-director of Harvard's Library Lab: "This is Big Data for books." A blog post described a day-long test run during which 15 hackers worked on a subset consisting of 600,000 items and produced various results.

Pretty amazing, huh? That's at least a couple miles of books!

"Big Data?" Give me a Break!

Apparently, no one does simple arithmetic anymore. Maybe it's the combined impact of reality TV shows, smartphones, global warming and Twitter. Who knows?

How much data does Harvard have? 12 million objects each with 100 attributes is 1.2 billion attributes. When I first started thinking about this, I gave generous estimates of the size of each metadata attribute. Then I got skeptical, dug a little, and found the actual data set. As of this month, there are 12,316,822 rows in the data set, with a compressed data size of 3,399,017,905. In case that appears to be a big number to you, it's less than 4GB.

4GB. Your smart phone probably has more than that, and your iPad certainly does. The laptop computer I'm using right now has more than that. Yes, yes, the data is compressed. How much will it be uncompressed? I could find out, but I'm lazy, and the answer in the very worst case is going to be about the same: a very small amount of data.

"Big Data" usually isn't

The reason I'm really tired of hearing about Big Data is that, in the vast majority of cases, it isn't big. Not only isn't it big, it's usually kinda small. So the people who talk about it as "Big Data" are either stupid or they're liars. Either of which make me irritated.

My acid test for whether a data set can't possibly be described as "Big Data" without embarassment or shame is whether it fits into the memory (as in the RAM-type memory, forget about disk) of a machine you can buy off the web for less than $15,000. These days, that's a machine from (for example) Dell that has about 256GB.

This machine from Dell will fit more than 50 copies of the Harvard Library data set. Who cares how big it is when uncompressed? Loads of copies of it will still fit entirely in memory.

Do not say "Hadoop" or "clusters." If you do, I swear I'll slap you.

Conclusion

The conclusion is obvious. Don't even start thinking the phrase "Big Data" unless you've first applied some common sense and performed some simple arithmetic. Definitely don't utter those words around me without first having applied the sobering tonic of arithmetic. And wake up and smell reality: "Big Data" and "Cloud" are little more than words that vendors use to trick unsuspecting victims into spending money on cool new things.

Then, of course, maybe you really do have big data — like objectively, unusually BIG data. That's cool. It's easier to work with and get value out large data sets than ever before, and I hope you can use the latest, most productive methods for doing so. Some of the Oak companies are doing just that, and it's exciting.

April 30, 2012
The Name Game of “Moving to the Cloud”

The most recent in a long string of technology fashion trends, "the cloud" is hot. Like its hot technology fashion predecessors, it mostly consists of old ideas with a little spicy sauce on top and fresh packaging. If you mindlessly follow the fashion and just "go to the cloud," you are likely to end up in the same unhappy place where most mindless followers of fashion trends end up. What is "the cloud?" Simple. Clouds are belated enterprise IT implementations of the consumer internet.

What is the Cloud?
Is the "cloud" a new development? Well, it is a new name…

Ancient Clouds
I first encountered the cloud more than 40 years ago. Before that fateful meeting, my only experience with computers had been up close and personal. You had to get in the room, push buttons, flick switches, feed card decks or punched paper tape, listen to whirring sounds and watch blinking lights. Like with this computer:

But The Cloud! Ahhh, the Cloud…I remember it vividly.

Of course, things were a bit different then. We only had "fat clients," as in "takes two guys to lug it" fat. The principle was identical, however: I was in one place, with a "fat client" like this:

…and in another place was a computer like this:

…and everything worked.
Remember the importance of labelling, however: what we did 40 years ago wasn't "cloud computing," which hadn't been "invented" yet — it was merely "telecommunications."

Creating the Modern Cloud
Between then and now, lots of iterations of Moore's Law have come and gone. All the hardware has gotten smaller, cheaper and faster, while the software has gotten larger, more expensive and slower — but, fortunately for all of us, the rate of hardware evolution is greater than the rate of software devolution, giving the impression of net progress.

Where this leaves us, 40 years later, is faster remote computers talking with lighter remote clients over incredibly faster networks, all at lower cost. Sprinkle with a little extra software and drizzle with some marketing hoo-haw, and — kazaam! — you've got today's hottest technology fashion trend, Cloud Computing.

Clouds aren't always friendly
When we think of clouds, we're likely to think of this kind of cloud:

friendly, fluffy shapes floating in an otherwise sunny sky. When people think about cloud computing, this is the kind of cloud they seem to have in mind. But as we know, there are other kinds of clouds. There are dark, oppressive clouds that make everyone depressed. And there are really mean clouds, that wreck things horribly, creating the situations for which "disaster recovery plans" are made. Is this just a metaphor? Of course not. But just like financial fraud, the big, juicy examples are usually hushed up in order to protect the guilty.

OK, so What is "the Cloud"

There is a little-discussed trend that is deeply embarassing to IT professionals: there is a wide and growing gap between the use of computing technology in the consumer world and in the corporate, data center world. When the average user of corporate data systems is home, he works in a very advanced computing environment. His local machines and devices are amazingly capable and pretty easy to use. When connected to the internet, he can access a nearly limitless world of cloud computing resources — which are themselves largely run out of data centers that are remote from the people who set up and administer the software in them, and which contain an ever-evolving mix of dedicated and shared resources and services. The consumer internet has been based on a cloud computing model for a long time.

The corporate world is a whole different thing. The corporate world has been consumed with consolidating their diverse data centers. They are finally beginning to confront the extreme flexibility and ease of use that consumers enjoy every day, and are finding it increasingly difficult to explain why the computing they run with such high capital and operating costs are so cumbersome, error-prone and inflexible.

In this context, there is no way that anyone associated with corporate computing is ever going to plainly admit that what they are basically doing is trying to catch up with the consumer internet. So they must be doing something else. Oh, yeah — they're evolving to the latest, smartest trend in corporate computing, adopting the latest technologies and being really leading-edge: they're "moving to the cloud," but of course in a "smart" way, with large doses of "private cloud" technology along the way.

Summary
What's a "private cloud?" A corporate data center with a fancier name.

What's a corporation "moving to the cloud?" A corporate IT group trying to play catch-up with the consumer internet, and desperately trying to make it look like something else.

What's new about "cloud computing?" Very little; mostly naming and marketing fluff.

Is anything real happening when a corporation "moves to the cloud?" Sometimes yes! Sometimes, they really are copying a couple proven techniques of the consumer internet, slowly and at great cost and trouble, but nonetheless creeping towards a 21st century computing model.

December 1, 2011
The Xiotech ISE and Technology Fashion

We all know what fashion is. Think of Vogue Magazine, or impossibly tall, thin women walking in that special way down the elevated runway, wearing something no normal person would be able to wear, or would want to wear if they could.

But fashion extends way beyond women's clothing. Let's start with men's clothing: how many guys wear suits and ties to the office today? Then cars — how many modern cars have those giant fins that were popular in the '60's? The kind of popular music you like dates you at least as much as wrinkles on your skin. The more you think about it, the more you realize how pervasive fashion is.

"Technology is a counter-example," perhaps you say. "It's bits and bytes and silicon, no fashion there!" Well, that's true, except that it's people who buy the technology, and people are fashion-driven creatures. Let's face it: the cool kids who once drove sporty cars now pull out their iPhones at the slightest excuse. Waiting on line to see the Beatles; waiting on line to get the latest iPhone — what's the difference? They're both fashion-driven fads.

"I concede that consumer technology is fashion-driven," perhaps you admit. "But hard-core computing technology, where nerds are building things for nerds; how can that possibly be driven by fashion?" I fully concur that no nerd techie would ever admit that his choices, selections and designs are driven by fashion, not even to himself. But all too often, that's exactly what's happening. The techie nerd who comes up with a design approach for solving a problem almost always prides himself on originality and foresight, without any awareness of how fashion-determined his most important decisions are. These decisions are often not made consciously; they are assumptions. "It's not worth discussing, of course we'll take approach X," the techie would respond in the unlikely event that the assumption is questioned — by some "ignorant" (which is tech-talk for "unfashionable") person. Just to be clear: we're not talking about how nerds dress; we're talking about how nerds think.

There are examples in every field of computing technology. The Java/J2EE fad during and after the internet bubble is an obvious example, and before it client/server computing was a huge fad.

There is a clear example in storage technology today. The fashion is as clear and obvious as short skirts, and moreover is explicitly stated by its adherents: the fashion is that storage functionality should be provided as a body of software, independent of any hardware embodiment, and without regard to any particular storage hardware. Companies that previously sold storage hardware no longer have real hardware design functions — all they do is bundle their software with hardware provided by others and sell the combination. The most popular form of this approach is to buy drive bays from an OEM and connect them to controllers consisting of off-the-shelf specially configured processors; 3Par and many others do this, for example. IBM's xiv implements a variation on this theme, using all IBM commodity server hardware. While there are still loads of dollars being spent on old-style, hardware-centric storage systems (think EMC), engineers building new storage systems are uniformly following the software-centric fashion.

In this sense, the Xiotech ISE is decided unfashionable. The ISE was invented at Seagate, in response to the CEO, Steve Luczo, wanting to create a storage product that was higher value than spinning magnetic disk drives. The idea was simple: build a fixed-format super-disk, with many Seagate drives, intelligence, etc. It would be bigger than a disk, but smaller than a SAN. It would emphasize basic storage functions (write, protect, read) and leave the "high level" storage functions to the SAN vendors.

What is interesting is that Steve Sicola and a group of other storage industry veterans ended up working side-by-side with Seagate engineers, something they never would have done at a SAN vendor. Sicola and his team knew the evolving fashions in storage quite well: ignore the details of the drives, that's "just storage." Build fancy high-level functions.

But since they were stuck with the drive engineers, they did something unusual: they actually listened to them! They learned about the amazing functions the engineers embedded in the drives that all the SAN vendors ignore. They learned how annoyed the Seagate engineers were at all the drives marked "bad" by SAN's, the vast majority of which are actually good; they learned about error codes and performance details that all the other storage engineers in the industry were studiously ignoring.

Before long, they got absorbed in what you could really do once you really knew the hardware. And, being good nerds, they invented a bunch of stuff, like how to virtualize over a fixed number of heads so that top performance was maintained even when the disks are filled up. They also invented a bunch of stuff that provides major, persisting advantages as new drives with higher capacities come out.

Since I know a fair amount about Xiotech's ISE, I want to go on and on about it. But I won't, because the point of this post is technology fashion. The purpose of bringing up the ISE is that it's a great illustration of the power of fads and fashion in technology. Any normal group of self-respecting storage nerds would have built a completely hardware-independent storage system. As such, it may have had nice features, but it would be pretty much like all the others in terms of its basic functions of reading and writing disks. But because these storage engineers were sequestered with hardware types and had a unique mission imposed from above, they did something very rare: they built a leading-edge storage system that is decidedly unfashionable. Because the engineers actually paid attention to the hardware, the ISE does things (performance, reliability, density, scalability, energy use, etc.) that no other storage system on the market today does, even though it uses the same disks available to others.

Fashion is, of course, a relative term. Fashion is one thing at diplomatic receptions, and quite another hiking in the wilderness or in a war zone. What is appropriate for one doesn't work for the other. Shoes that are appropriate for a salon can cripple you in the woods.

Well, it turns out that the modern storage fashion of ignoring the storage hardware may be acceptable in salon-type environments (where appearance and style is important but there's no heavy lifting to be done), but is as crippling as high heels in the I/O-intensive environments that are increasingly found in virtualized, cloud data centers. The ISE is like storage fashion for war zones of data, for data-intensive applications like virtualized servers, where the applications are concentrated in a small number of servers, all fighting to get their data. Most storage systems know how to hold their tea cups and conduct refined discussions and other things that matter when getting your data sometime today would be nice, thanks.

But when you've got a crowd of rowdy, tense applications all of whom are demanding their data NOW, perhaps more of a war-time style is appropriate; that's what the "unfashionable" nerds at Xiotech created in the ISE.

July 3, 2010