The Black Liszt - Page 25 of 27 - Fact-based contrarian thoughts about software

Software Quality Horror Tales: Electronic Diversity Visas

The State Department of the US has inflicted unimaginable pain and suffering on tens of thousands of people throughout the world through their electronic Diversity Immigrant Visa program. It's a highly visible and public example of what's wrong with software development in general, and software quality in particular. Sadly, it's no different in principle from countless numbers of other projects, doomed from inception by inappropriate standards and techniques.

The Facts

The Diversity Lottery Program is just that — a lottery. More than ten million non-US-citizens worldwide apply for tens of thousands of slots that can lead to US citizenship. Only this time, after notifying the "winners," who started spending money most of them didn't have to comply with the requirements to complete the process, the State Department cancelled the lottery and invalidated the results. Why? A bug in the computer program that chose the winners.

The Human Consequences

From the Wall Street Journal:

Ever since he traveled from his home near Yaounde, Cameroon, on a scholarship to Michigan State University in 2009, Dieudonné Kuate dreamed of immigrating to the United States.

As a visiting graduate student in epidemiology, he marveled at the sophistication of the chemistry labs and the excellence of the teaching. There was no comparison to his university in Yaounde, where he shared a cramped 27-square-foot room with three other students.

One of eight children, Mr. Kuate grew up on a poor farm in the western plateau town of Banjoun. His parents couldn't read or write. Mr. Kuate is the only child in his family to complete university. "My dreams have been to be a top researcher in my field of specialty. The only place I see these goals being realized is the United States," says the 31-year-old Mr. Kuate, who returned to Yaounde last year and finished his Ph.D. in chemistry.

For the past six years, Mr. Kuate has applied for the State Department's annual green-card lottery, and, like 15 million other people, he applied again this year. The 20-year-old program offers about 50,000 people a year a chance to win permanent residence in the U.S.—and a ticket to the American Dream.

Denied six times, Mr. Kuate finally saw his number come up on May 1.

"There is no English word to express my happiness when I discovered that I was selected," said Mr. Kuate, whose first name means "God given."

Emmanuel Tumanjong for The Wall Street Journal

Within days, his older brother sold family land in Banjoun for around $4,400 for Mr. Kuate to use for application fees, medical examinations and to start a new life in the U.S., he said. His mother believed God had intervened: "According to her, I was going to travel to the white man's country and see how to help other family members who have not gone far in book work," he said.

But on May 13, those hopes were abruptly dashed. After logging on to the State Department website, Mr. Kuate said, "I saw a message saying the lottery had been canceled."

Mr. Kuate was among 22,000 people around the world mistakenly informed last month that they had won the lottery. There had been a technical glitch and the lottery would have to be held again, the State Department said, explaining that a computer had selected 90% of the winners from the first two days of the application window instead of the full 30-day registration period.

The software

It's pretty clear that this is one of the more trivial programming jobs on the planet. I shudder to think how much it cost to build, how long it took, and the whole environment that was created that made it (I'm sorry to say) likely that a horrific bug like this would be inflicted on so many innocent people.

Since I have no access to the code or project documents, I will comment on a couple of things that are publicly available.

Take a look at the Department's page that announces the status of the 2012 lottery. Play around with it a bit, as I did.

Did you want to find more information? Did you take advantage of the kind offer to provide more information:

More information is available on our website:

http://dvlottery.state.gov

Perhaps, then, you noticed that the link leads you to the same page you're already reading!! Kafka couldn't have done it better. No doubt this was the careful work of the Division of Self-referential self-reference of the Department of Redundancy Department.

Did you take note of the fact that all entries were submitted electronically between October 5, 2010 and November 3, 2010? Which implies that starting on November 4, 2010, they had all their input data? All they had to do was run the lottery program a couple times on the input, run some checks to make sure it was working properly on the new data set, then run it "for real" and publish the results. To be generous, this should have taken about a day. OK, it's the government, we'll give them a week. Really? Geeez…alright, a month! No. NO WAY. $%&$%^& SIX MONTHS!!!!??? ^&(^^&* MORE THAN SIX MONTHS???!!!

With that much time, this should have been the most proven-to-be-perfect program in history. PhD students should have been able to break new ground in proving the certainty of correctness of this program. It should have been possible to run it a number of times that compares favorably to the number of grains of sand on all the beaches on planet earth.

I love the fact that there's a transcript of a statement on the subject by "David Donahue, the Deputy Assistant Secretary of State for Visa Services." The statement location and date are unspecified. The date of posting is not given. The fact that he made a statement verbally rather than just talking with the public via the web site kind of implies that he's incapable of writing or typing. His "internet department" (or whatever) must be responsible for the web site. And it implies that he still has a job! For some reason, I find that one really annoying! I guess you can screw over incredible numbers of people on behalf of the US Government and suffer no personal consequences. It must be OK to do that!

There's a lot more to be said about this fiasco, but I'm tired.

Conclusion

Software quality. We need a revolution! Stop the Horror! End the Terror!
Software Quality: Horror, Failure, Tragedy and Ineptitude

Most software stinks. Too much software is a real horror show. Loads of software is discovered to be rotten to the core before it is widely inflicted on customers, leading to missed budgets, deadlines and crippled businesses. An amazing amount of software, already bad and late, is still inflicted on the world, sometimes creating true tragedies. The only things in software that are more riddled with stupidity and incompetence than designing and building it are the methods used for assuring its quality.

We should be running around in panic, trying to strengthen the levees against the rising flood of software problems; instead, we have smugly promoted standards, careers and processes. In the face of repeated, costly failures, the standard prescription is to do more of the same, except spend more money and take more time.

"Software quality" should be added to the cynical pantheon of "southern efficiency" and "northern charm," words that we know just don't belong together.

I have spent more than 40 years in the software industry, and have made my share of mistakes. I'm no more perfect than anyone else. But through my personal experience of writing a great deal of code over decades, and now from my vantage point of getting an inside look at the work of many software groups, some unmistakeable patterns have emerged.

The key observation is that software development, as it is taught and practiced nearly everywhere, is not a collection of best practices that are constantly being refined and improved over time. It is a disorganized mish-mash of ideas badly adapted from other fields whose practices are simply inapplicable to software, and then patched and elaborated to make them sound good. This observation applies even more strongly to software quality.

I accuse no one of bad intentions. It does, however, make me angry and sad that our field is so burdened with pre-scientific, faith-based dogma that doesn't appear to be getting better. We don't need evolution here. We need revolution.

I hope to illustrate these claims in a series of posts. They deserve a cover like this one:

Are better methods available? Yes. they. are.
Software: Move Quickly While Breaking Nothing
In places that want to succeed (why would you be anywhere else?), the pressure is on the software team. Change more stuff. Move more quickly. Where are the results?

Ok, great you say. I'm already working as hard as I can. The only other thing I can do is relax the process, the safeguards that assure good, high quality results. I'll give it a try.

Maybe you (and your company) get lucky for a bit. Then the inevitable happens. The release that just got pushed had bugs and/or unintended consequences. There are panicked calls, late nights, frayed nerves and angry managers. The managers are extra angry at the programmers who did what the managers said, not what they thought they implied (i.e., don't break anything). The software people fume about management that acts like the ancient regime, making insufferable, impossible demands of the oppressed workers. No one is happy.

"Push More Releases" is NOT the Answer (by itself)

Sometimes when I talk with programmers, I'll give some version of my "releases are stupid" talk or my "dates are evil" talk and some brave soul comes out with the classic "I tried that and got screwed" response, as summarized at the beginning of this post. And I always agree with the brave soul: if all you do is push more releases, you're not just inviting disaster, you're sending a chauffeured limo for disaster and welcoming it with a red carpet and cheering crowds.

The classic, project-management-centric methodology with its infrequent releases is broken. Broken!! So who in his right mind would think that keeping the methodology except for some essential, integral parts is a good idea?

Let me see if I've got this right: you don't like your methodology, so … you're going to cripple it and hope things will improve??!!

By the way, if you're sitting there with a smug expression thinking "we don't have that problem, we use agile," think again. Agile does not change the situation.

The Answer is a New Methodology, Optimized for Speed

The details of the new methodology are important; I've written about them at great length in my private-distribution papers. It's a seismic shift from the normal way of doing things. Like with any big change, it's useful to start with baby steps. So here's a start:

Grow the Baby

The mainstream process of software development resembles a car factory. There are lots of inputs. The inputs are assembled into sub-assemblies and then assemblies. There is a final integration phase, after which the car rolls off the assembly line. When it rolls off, people hope it works; there is no expectation that it will work prior to the end of assembly; until then, it's "work-in-process." This is typically the way things work whether you release once a month or once a week.

The speed-oriented method resembles growing a baby. There is one and only one overriding rule: Don't Kill the Baby! The baby is grown by tiny incremental additions; each addition takes awhile to get right, but none of them is fatal. It takes a while for it to learn to walk, for example, and spends some time walking poorly. But while this is going on, the baby doesn't lose its ability to crawl, eat, burp, roll over.

Don't Kill the Baby!

When the subject is babies, there is near-universal agreement that killing them is something to be avoided. But when software is developed with the usual methods, it's alive only some of the time — mostly it's dead! The cornerstones of the speed-oriented method are:
- Small, frequent changes. You should make progress towards your goal every day. Do the best you can and make the most progress you can every day.
- The new stuff doesn't need to work at the beginning.
- The old stuff can't be allowed to break. This is usually achieved by some kind of continuous integration and live parallel testing. How you do it isn't important. That you do it is. Your software must not break. Clear?
- Iterate. Don't lay out months' worth of work. Set overall goals, then spend some time looking at what you did yesterday to help decide what to do today.
Baby Yourself!

One of the Oak companies was having trouble meeting all their commitments. The pressure was on to deliver more, and deliver it quickly. But the company was also getting beat up because of serious flaws in recent releases. Talk about being caught between a rock and a hard place!

It took quite a bit of work, but they shifted to the kind of speed-oriented, always-alive method described here. They had some fun with it along the way. Part of the fun was the mascot they adopted, which I got to see during a recent visit.

Grow the baby!
Why Computer Software is So Bad

There are aspects of the theory and practice of computer software that drive me nuts.

I look at a widely accepted theory or practice, and am aghast that the cancer spread and became part of the mainstream. I look at a genuinely good practice and am shocked at the way everyone lies and slithers to make it sound to the unsuspecting that whatever lousy thing they do is a shining example of a good thing. I look at a nearly-universal approach to problem solving that may have made sense many Moore's-Law generations ago, but has long since been rendered irrelevant by advances in hardware. And I look at proven, understandable techniques that improve productivity by whole-number factors that are spurned and/or ignored by most people in the field.

How significant is this stuff I'm complaining about? A minimum of a factor of 10. Sometimes a great deal more. So it matters! This is theoretical, but it has real-world, practical implications. It's the only way, for example, that small companies can beat large, established ones that have software staffs 10 to 1,000 times larger.

I think I understand some of the causes of this deplorable state of affairs. But that doesn't make me feel better. And it definitely doesn't cure the sick or empower the unproductive.

A number of my private-distribution papers go into these subjects in considerable depth. But I thought it might be interesting to summarize a couple of the main observations here.

Among the causes of bad software are:

The blind and deaf leading the blind. In the majority of cases, the people in charge have little to no personal experience of creating software, and no interest in how it's done. This is about as sensible as having someone in charge of a baseball team who not only can't play baseball, but can't see onto the field where it's being played; all he can do is see the score at the end and hear reports from the players about how things are going. So everything turns into politics — convincing the blind, clueless boss that you're the great contributor, everyone else is a chump. The larger the organization, the more likely it is to be led by such a genius.

Process over substance. The larger and older the organization, the more likely it is to elevate the importance of process, and the more elaborate and all-consuming that process is likely to be. The more process there is, the less code gets written, and the productivity of the innocent few who actually want to work gets ground down to the abysmal norm.

Lawyers. Shoot the lawyers! Shoot them! They and their legal ways are a plague. When lawyers see a problem, they want to write a law or create a regulation that makes the problem go away. Except that instead of saying this in simple, results-oriented terms (e.g. "programmers should not be allowed to die unnecessarily while writing code"), they say it in terms of incredibly elaborate, micro-managing, this-is-how-thou-shalt-do-it regulations (e.g., "programmers should be offered a nutrional meal no later than 12 noon local time each day consisting of no less than…" and 538 similar instructions, constantly growing). If regulations of this kind actually led to good results it would be one thing. All I should have to say at this point is "credit card theft" and PCI.

Fashion over function. Programmers are supposed to be nerds, aren't they? How can programmers let their decisions be made by "fashion," whatever that's supposed to be in the world of software? I refer you to the first point, about software teams being led by people who are clueless. These people want to seem smart in the eyes of their bosses and peers, who know even less than they do. So they make decisions that are guided by C-office fashion trends, which are usually laughably out of step with true optimal decisions.

A preference for bad new ideas over good older ones. How do bad ideas catch on? Some company promotes them. Books are written. Appealing rhetoric is created and repeated. Suddenly a fashion trend is born. Someone can appear to be smart and modern just by advocating the cool new stuff and sounding smart, without actually knowing anything or doing anything. Everyone involved thinks they're helping advance the state of software, when in fact they're digging the hole of despair deeper. It's remarkably like when the dumbest guy in Scotland moves to England, thereby increasing the average intelligence of both places.

A preference for bad old ideas over good newer ones. There aren't many good new ideas; mostly there are old good ideas that are still good. But because conditions change (like processors getting fast and memory getting cheaper), ideas emerge to respond intelligently to the new state of affairs (like this one), instead of blindly and stupidly continuing on as though nothing had changed. Like most people do.

Wrong scope; usually too narrow. This is classic optimization over too narrow a range. Much bad software results from compartmentalism or simply failing to look at things through the eyes of the ultimate consumer. This is an incredibly common mistake. The trouble gets really bad when is the practices that result from the little "silos of excellence" get elevated into industry "best practices." (Whenever I hear the phrase "best practice," it's really hard for me to avoid getting a serious look and pronouncing in deep tones "you folks had best practice a bit longer before you inflict yourselves on the world of paid, professional programming.")

I'm sure there are many more causes of bad software than the ones I've listed here — we haven't even gotten into bad programmers in their innumerable incarnations! But every item on the list above thoroughly deserves inclusion, even more because most of them are not listed by most people as a cause of software malaise.
Fusion IO and Xiotech Hybrid ISE

Congratulations to Fusion IO for pulling off the most visible, most large-scale event in the storage industry in recent months: their highly successful IPO. And congratulations to Xiotech for pulling off the storage industry's most important event in recent months: the GA release of Hybrid ISE storage blades.

Fusion IO

Fusion IO went public and traded up in an over-subscribed offering. The excitement about the company and its prospects are justified: the storage industry has been basically ignoring its large and growing performance problem for years now. The industry has toyed with a variety of ineffective strategies involving the obvious alternative technology, SSD, but done nothing to move the performance needle.

The brilliance of Fusion IO is to come out with a new category of product; they call it "memory storage." It's a board that you typically put into a server.

This is the brilliant part! The people who run applications have a huge performance problem, and the storage "experts" refuse to solve it for them, so Fusion IO gives them something in "their world" — a server board — that they can use, along with application changes, to solve their problem. It's a classic strategy: sell to the guy who actually has the problem.

Here's the other thing I like about Fusion IO: their success clearly and unambiguously demonstrates that applications are experiencing storage performance problems, and that these problems are house-is-burning serious.

Xiotech Hybrid ISE Storage Blade

Xiotech's release for GA (general availability) of the Hybrid ISE last week is the most important recent event in the storage industry.

The success of Fusion IO clearly demonstrates that increasing numbers of applications are experiencing house-is-burning performance issues. Most of the traditional SAN storage vendors have responded by simply putting SSD drives into their existing products. In all cases, the result has been modest upticks in performance coupled with drastic reductions in capacity and dramatic increases in cost. Not a good combination, and not popular with customers.

The Fusion IO approach works well for the small number of companies who have just a couple of hugely important applications that are completely under their control, and who therefore don't mind violating the normal principles of storage management and re-writing their applications to take advantage of server-based storage. In fact, I've just described exactly the situations of FaceBook and Apple, who between them account for more than two thirds of Fusion IO's recent revenues!

What about the vast majority of companies who have pressing performance problems and don't want to or simply can't break the bank and madly rewrite their applications? What they need is a new kind of storage they can just plug in and make their problems disappear: storage that is affordable, reasonably priced, many times faster than what they have. In other words, storage that is, well, storage, that looks and acts like normal storage in every way … except that it is hugely faster, like five to ten times faster.

Hybrid ISE, that's your cue…

The Xiotech Hybrid ISE. The right combination of HDD, SSD and RAM cache in a 3U rack-mount package to provide the excellent performance and capacity your applications need. Just by plugging it in. Check it out!
Business Benefits vs. Speeds and Feeds

As a programmer building computer-based products, I have been subjected to decades of relentless, cruel beating by oh-so-superior marketing types on the question of how to describe a product or service. Do you use technical terms and talk about how much and how fast? (That's mere speeds and feeds. Said with a sneer.) Or do you talk about "business benefits," meaning I suppose lower costs or superior customer service or something else suitably vague and unmeasureable? (Obviously the "right" answer.)

After decades of watching good material being subjected to "marketing polish," which means all the substance is removed, and seeing the ultimate outcome (typically unsatisfying), I finally know the answer. The right answer truly depends on what you're selling and to whom.

Option 1: You are a big, tired company selling essentially the same old stuff, maybe with the latest buzzword pasted onto it. You're trying to sell it to a bunch of people marking time, who mostly want to avoid getting fired.

If you're marketing for the seller, your job is to avoid technical detail and any specific measurements of performance like the plague. You'll just confuse and annoy the buyer, who only wants to be assured that everything is going to be just fine. "Business benefits" or anything else that makes you sound safe and conducive to good naps is the way to go. This case describes the vast majority of seller/buyer combinations, and so of course represents mainstream marketing.

Option 1a: You are a big, tired company selling essentially the same old stuff, maybe with the latest buzzword pasted onto it. You're trying to sell it to a bunch of people who have the kind of problem you solve, people who know stuff and who really care about getting things right.

If you're a normal marketer, you will not like these buyers. They are obnoxious and pushy and care about all sorts of nerdy things. It's pretty clear they're not going anywhere. Talking with them is a sure sign that we're selling "too low" in the organization. We've got to get to the people who care about "business issues," i.e., the people who are ignorant about what they're buying and whether it's any good.

Option 2: You're a hot new company with a brash, ground-breaking product that's going to change the industry. You're trying to sell it to the same firing-avoidance people as in option 1.

You are wasting your time. Talking about "business benefits" isn't going to help. These people just don't care. Your passion is the final nail in the coffin: you have "I'm risky" tatooed on your face. Just leave. Now.

Option 2a. You're a hot new company with a brash, ground-breaking product that's going to change the industry. You're trying to sell it to a bunch of people who have the kind of problem you solve, people who know stuff and who really care about getting things right.

These are people with a problem. They know because they've measured it. They know how fast, how much, how often they're getting now, and they have a real good idea of how fast etc. a product has got to be to make the problem go away. Can you deliver or not? If not (or if you insist on bringing the conversation back to "margin" or some other stupid supposedly "business" virtue), please just leave. If you say you can, give me the numbers, please. The benchmarks. The speeds and, yes, the feeds. Numerically, if you please! If the numbers are good, how soon can I have it?!

Conclusion

If you've got a hot product, don't waste your time talking with anyone who doesn't have a hot problem. When you do, lead with the facts. If they've got the problem and you've got the solution, the air starts smelling like "deal."

Postscript: Cars and Trucks

Now that I've figured it out, I'm realizing just how out of it these clueless marketing types really are. It's all about whether you're selling to buyers who actually do things, know stuff, and have to suffer the consequences of their own choices. Think about … pickup trucks, for example. Like this one:

Who's more about mainsteam marketing than GM? How do they market this truck, which will largely be bought by people who actually need to drive a truck, and want it to pull and haul stuff, and will be in big trouble if it can't? Well, go and look at GM's own marketing for this vehicle. There you will learn all sorts of things like this:

More maximum hp and torque – 397 horses and 765 lb.-ft. of torque. (Our standard workhorse, the Vortec 6.0L V8 gas engine, generates 360 HP and 380 lb.-ft. of torque.)

Duramax 6.6L Turbo-Diesel

The reliable available Duramax Diesel has improved torque, horsepower and towing numbers, as well as highway fuel economy that was improved by more than 11% over the previous generation.

With a 36-gallon tank (approx.) the Duramax offers up to 680 highway miles on a single tank

To better manage larger capacities, the steering knuckle is taller and 66% heavier than before

The rear suspension includes 20% wider leaf springs with an asymmetric design that reduces wheel hop on acceleration and better handles increased payloads

Speeds and Feeds! And of course, nice pictures and some "business benefits" like how comfortable you'll be sitting in the driver's seat.
How Great Software Teams Can Go Wrong

Every once in a while, a miracle happens: a group of smart, hard-working software people latch onto a group of customers with unmet needs, and wonderful things take place.

Once things get up a head of steam and it becomes obvious that good things are happening, the vultures start circling. "That's an inappropriate metaphor!" you might exclaim. "Vultures feed on dead animals, not ones that are alive and well." OK, I concede the point. "Vampires" would be a better metaphor, because everyone knows that vampires attack the young and healthy and turn them into unfeeling parasites like themselves. Let's compromise: when there's great software happening, the vampires start circling.

The vampire/vultures tend to be older than the programmers actually doing the work. Regardless of the facts, they like to complain about things like the lack of order, the lack of predictability, and the lack of control. They ALWAYS find lots of evidence that these supposedly indispensible things are missing. Things are moving quickly and the business is growing quickly, so of course the vampire/vultures find all sorts of evidence that the business, as currently "organized," isn't "scalable." Things may be going OK now, but we have just been lucky! There's too much at stake to leave our future to chance. We need some "experienced, senior" managers to assure the future of our important business.

What happens next is simply awful. I've seen it too many times. I know all the script variations. But let's skip past the mayhem to the final stage, which can reasonably be called "the inmates are in charge."

The Inmates are in Charge

There are sure signs that the inmates are in charge. When this happens, you may as well put this sign over the entry door of your office: "Abandon All Hope Ye Who Enter Here."

Following is a sample list of the reasonable-sounding things that people say in this horrible place, each followed by the reality.

“After you work a certain number of hours, your efficiency simply goes down. You can’t concentrate as well."

Everyone slips out starting at 4 p.m. Which would be OK if they hadn't wandered in around 10am.

“This pushing to make unreasonable deadlines just exhausts everyone. They fight with each other, don’t get any better results, and then, after, everyone is so beat that nothing gets done for weeks.”

Deadlines get established so far in the future that earthquakes, floods and fires can take place, and they can still be met. The meaning of the word “deadline” evolves from “I’ll make it or die trying” to “a dead person could meet the target.”

“People who do nothing but work are unpleasant, narrow, one-dimensional people. They don’t work well in teams, and before long they just crack up and become unproductive anyway.”

HR people start roaming the halls at 4 p.m. suggesting that people go home and spend more time with their families. Anyone who seems stressed is invited to take a week’s vacation.

“Our products are sophisticated, complex pieces of technology. It takes a long time to understand everything. It’s hard to find enough experienced people to do the work.”

The “go along to get along” people are promoted and salaries of senior people advance rapidly.

“There is a huge amount of value in the intellectual property we have built up here over the years and embodied in our people and processes.”

Suggestions for change are rejected without serious consideration.

“Rodney isn’t really a team player. He makes other people uncomfortable, and sometimes says things that make other people feel bad. He’s bad for morale.”

Rodney is smart, hard-working and wants to build a great product. But he doesn’t fit in well, and is asked to look elsewhere for employment.

“Our customers have come to depend on our reliability and commitment to quality. We have to continue to pay attention to what we do, and continue to do it extremely well.”

Don’t bother suggesting doing more, working in parallel, or getting things out more quickly.

“Our customers don’t have the money to chase after every cool-sounding new idea that comes along, most of which fail anyway. They depend on us to incorporate new technology for them once it actually becomes useful and proven.”

We will continue to defend our approach, using words like “practical” and “proven,” even after it has become dead-and-buried obsolete.

AARRRRGGGHHH! Get me out of here!!!

Conclusion

Thankfully, not all great software efforts follow this tragic script. But here's something I've learned: when things are going well in software terms, there are always vultures and/or vampires eager to attack and do their awful worst. Constant diligence is required to keep them frustrated and hungry.
Single Point of Failure: Logical vs. Physical

People who want to build a highly available computer system tend to focus on eliminating single points of failure. This is a good thing. But we tend to focus only on the physical layer. We don't even notice the single points of failure at the logical layer. Logical single points of failure are just as likely to result in catastrophe as physical ones, and it's high time we started paying attention to them!

Why? Keeping your system up and running is the most important job of an organization's techies.

Physical redundancy

We eliminate single points of failure by having more than one of every component, and a structure that enables the system to keep running with a failed component, while allowing it to be repaired or replaced. For example, here is a typical redundant system diagram (credit Wikipedia).

And here are instructions for replacing a hot-swap drive in a redundant design (credit IBM):

Logical vs. Physical

We are familiar with the concept of logical and physical in computing. All of computing is built on layers and layers of logical structures. Frequently, we call something a "physical" layer which is actually the next layer down the stack of logical layers; we call it "physical" only because it is "closer" to the actual physical layer than the layer we call "logical." A good example of this is in databases, in which it is common to have a "physical" database design and a logical (higher level) one; of course, calling it a "physical" design is a joke, there's nothing physical about it.

Keep eliminating physical single points of failure

I am not arguing that physical redundancy isn't important or that we should stop eliminating physical single points of failure. When I see people running important computer systems that have single points of failure, I tend to wonder how often the people in charge were dropped on their heads onto concrete sidewalks as children, and how they manage to feed themselves.

The principle is simple: if you have just one of a thing, and that thing breaks, you're screwed. For example, if you have your database on one physical machine and that machine breaks, no more database.

What is a logical single point of failure?

I think people don't pay attention to logical single points of failure because it just isn't something anyone talks about. It's not part of the discourse. Let's change that!

A prime example of a logical single point of failure is a program. You create physical redundancy by running the program on several machines. Great. That covers machine (physical) failures. What about program (logical) failures? After all, program failures (i.e., bugs) hurt us far more often than machine failures. But somehow, we don't think of a program as a logical single point of failure. We think that the program has bugs, that our QA and testing weren't good enough, and we should re-double our QA efforts. And somehow, miraculously, for the first time in history, create a 100% bug free program. Ha, ha, ha, ha, ha, ha, ha.

Suppose you have version 3.2 of your program running in your data center. If that program is running on more than one machine, you have eliminated the physical single point of failure. If version 3.2 is the only version of the program that's running in the data center, then version 3.2 is a logical single point of failure. The only way to eliminate it is to have another version of the program also running in the data center!

Eliminate logical single points of failure, too!

Smart people already have a data center discipline that eliminates logical single points of failure.

Suppose there is a new version of linux you think should be deployed. Do you stop the data center, upgrade all the machines, and start things up again? While that may be the most "efficient" thing to do, from a redundancy point of view, it's completely insane. Smart people just don't do it. They put the new version of linux on just one of their machines, and see how it goes. If it runs well, they will deploy it to another machine, and eventually all the machines will have it. It's called a "rolling upgrade."

Same thing with the application. Great web sites change their applications frequently, but using the rolling upgrade discipline, and if they're really smart, with live parallel testing as well.

Getting into the details of application design, the best people go beyond this method to create another logical layer, so that the things that change most often are stored as "data," in a way that changes are highly unlikely to bring down the site. A simplistic example is content management systems, which are nothing more than a way of segregating the parts of a site that change often from those that don't, and keeping the frequently changed parts (the content) in a non-dangerous format.

Conclusion

There is little that is more important than keeping your system available to your customers. Eliminating single points of failure is a cornerstone activity in this effort. Many of us are well aware of physical single points of failure, and eliminate them nicely. It's time for more of us to include logical single points of failure in our purview, and to eliminate them with the same vigor and thoroughness that we do the phsical kind.
Franz Liszt: 200
2011 is the 200th anniversary of the birth of Franz Liszt.

Liszt's music is recognizable by anyone who (like me) watched early cartoons, like this one:

[OOops — removed from YouTube. Search for Hungarian Rhapsody #2 cartoon]

(Image credit.)

Beyond that, he is significant to me as is obvious from "BlackLiszt," as I briefly explained in the first post of the blog.

Liszt is one of the greatest, and most under-rated composers of music. This article says it well:

BBC Radio 3 is beginning Franz Liszt's bicentenary year with…wall-to-wall Mozart. Nothing could make clearer something that has bugged me for years: the critical, snobbish, misinformed and persistent denigration of a musician who was the very embodiment of Romanticism.

It is possible to admire Liszt simply because you like listening to his music. Here are some of the additional reasons I admire Liszt, with obvious parallels to computing:
- He had complete technical mastery of his instrument, which he achieved by hard work and relentless practice.
- He started with nothing and earned his way to fame and fortune. He was one of the most accomplished and generous teachers, helping countless talented younger people to make themselves better.
- He deeply appreciated the accomplishments of others. He showed the seriousness of his appreciation by promoting the people and works he admired. He showed the depth of his understanding by making transcriptions that are themselves works of art.
- He took the current standards of his day seriously and mastered them; and then smashed the conventions to make things better.
- He felt deeply and acted on his feelings. He felt and expressed the deep connections of music to the human spirit.
- In spite of his position and accomplishments, he spent his life learning and exploring.
Personally, I never need much of an excuse to bring Liszt into a conversation; but I'm glad it's his anniversary year, because it makes performances easier to find and I don't need to work quite as hard to bring the conversation around to one of the greatest performers, teachers, creators, appreciators and pioneers who ever existed, an inspiration to us all.
Chase’s Exemplary Handling of Data Theft
I think if Chase had really tried, they could have done a worse job telling customers about the recent security breach.

Background

Apparently being incapable of performing the requisite fairly simply processing and analysis on their own, Chase and other giant financial institutions give their customers' data to Epsilon (among others!) for marketing-related processing. Despite (I assume) conforming to all the odious rules and regulations for keeping the data secure, Epsilon somehow suffered a major data breach; in order to protect the guilty, the details have not been released.

Chase's e-mail

Naturally, Chase and others rushed to assure their customers that everything was really OK, while providing them with helpful hints about avoiding getting scammed by all the crooks who now have the data. Here's the one I received.

Why Chase deserves an award for Badness

Chase provides a wealth of examples not to follow if you want to treat your customers with respect. Here are a few of the highlights.
- Timing. The breach reportedly took place on March 30. It was made public the following day. I received Chase's e-mail on the evening of April 4. Boy, Chase sure fell over themselves getting the word out to their customers, didn't they?
- What was stolen. Epsilon's own press release admits that not only customer e-mail addresses, but also names were stolen. If you read Chase's tardy missive word for word, you notice that they carefully omit to tell their customers that their names were also stolen, while repeating that no "customer account or financial information" was stolen. Surely a customer's name is part of that customer's account! If not, exactly what is it? Why couldn't they just be honest, and tell me that my name was stolen too?
- What was stolen. Epsilon's second press release emphasizes how they have absolutely, definitely, no-kidding determined that nothing but names and e-mails have been stolen. I'm sorry, but this can't possibly be true. Chase isn't using Epsilon just to do e-mail blasts. They are using them for their analysis based on detailed customer information. According to Epsilon itself, this data includes "Comprehensive income, credit, debt and asset data." It is simply not credible to claim that this data could not be deduced by the thieves from what they took. Neither Chase nor Epsilon bothers to mention all the customer-specific information they've got, which also includes "age, marital status, occupation, ethnicity and changes such as a new child, a move, changes in household income or a new driver."
- Disastrous advice. Look at the list of recommendations in the e-mail. Do they once, even once, describe, mention or warn against phishing, which is the real danger of having this information out there? They do not! What do they warn against? Repeatedly, they tell you not to put sensitive information into an e-mail, or to respond to a spam e-mail. When the real danger is phishing!
- Unwanted spam. I can't help pointing out that Chase gives me the incredibly insightful advice to "be on the lookout for unwanted spam." As opposed to the spam I want? After I've identified my spam and put it into "wanted" and "unwanted" piles, exactly what should I do? Since I was told to be "on the lookout" for it, I guess I should spend some time looking at it.
- Follow up. Chase promises to tell me "everything we know as we know it, and will keep you informed…" Simply put, there has been no follow-up. If you're not going to do it, don't say that you will.
Summary

It is clear that Chase
- notified its customers tardily,
- demonstrably lied about what was stolen,
- gave terrible and/or laughable advice about what the customer should do,
- and finally made promises they failed to keep.
Could they have done worse? Probably. Meanwhile, let's use this as an anti-role-model for how to handle situations of this kind.
Bad Customer Service: Whose Fault is it?

When a customer gets bad customer service, who's at fault? The person delivering the bad service, right? In some cases, yes, of course. But in a variety of important cases, the person delivering the bad service is not at fault, but gets blamed (or blasted!) for it anyway, while the guilty party gets promoted.

Sounds strange, right? But it's all too common. I experienced it personally last weekend, and my own experience will illustrate the point quite nicely.

The Cablevision Store

After several years of fine service, my cable modem started showing signs of dementia, wandering off into disfunction with increasing frequency, the only cure for which was a hard re-boot. After several weeks of wishful thinking, I finally accepted reality and took my old cable modem to the cable company's store to trade it in for a newer model last Saturday.

Shortly after starting my transaction, another customer came into the nearly-empty store with a DVR. He went up to the service person next to mine, handed over the DVR, and explained his issue. He said that he had been given the wrong DVR the day before — he needed one with an HDMI connection. Could he please trade for the right box?

I'm sorry, says the pleasant CSR (customer service representative), we're out of those boxes. You'll have to come back next week. That's when the fireworks started.

"I spend $250 a month for your service…how can you be out of DVR's?? … I demand to speak to a manager … I can't come back on Monday, I work in the City and Saturday is my only day for getting stuff like this done …"

It got worse. Various impolite words started flowing freely…

…inspiring the large fellow helping me to try to calm the guy down, telling him there's no need to use language like that, etc. In the end, he stormed out with his non-functional DVR without a resolution.

Meanwhile, I got a new cable modem with no instructions. I was told to just plug it in and it would work.

Who is Responsible for the Customer's Bad Experience?

The CSR's in the little store-front operation? Hardly. They were pleasant enough. But they didn't have what the guy needed — and had every right to expect! This is a guy who's headed right to FIOS, in spite of Verizon's well-deserved reputation for equally horrific customer service.

OK, so who is at fault here? It's hard to know for sure, but it is likely to be some combination of simple incompetence in the people and/or systems that manage inventory at the stores…

…which wouldn't be a bit surprising… or it's some "clever" person in management or finance trying to make the corporation some extra profit (and themselves a promotion) by shaving down the inventory levels even further…

(credit: www.bybee.com)

More Where that Came From

I started thinking about all this as I drove home and started installing my new cable modem. I replaced it exactly, and it didn't work! OMG!! It's a new cable modem! What could be wrong? I checked the cables. I rebooted a few times. I replaced all the cables. Could my wireless router have somehow gone bad at the same time? Out to the store, bring home new wireless router, install, no improvement. Probably the new cable modem is simply dead. It's too late to return to the Cablevision store, so I get on chat. The guy asks me to run a test, and by the time I'm back (all of 3 minutes later), my session has been timed out for inactivity! So I log in again (requiring another download of the chat program), and the fellow can't "see" my modem from the network side. No surprise. He asks me to connect it to the place where the cable service enters my house, and he still can't see it. So he agrees that it's bad, and actually gives me an appointment for a service call on Monday.

All I can say is, thank goodness for neighbors who install WiFi without security.

The service guy shows up on Monday, looks at my new cable modem, and immediately says, "they gave you one of those?" It turns out the service guy knows that these devices are obsolete and notoriously unreliable. As I have confirmed yet again. He hooks up a new cable modem he had brought with him and everything works fine. Problem solved.

Whose fault was the bad customer service I received? The people in the store? The people on chat? The guy who came to my house? Of course, the answer is none of the above.

If you tried to track down the person actually responsible for the bad service I (and many, many like me) received, you'd run into something like this:

(Credit: Animation Guild)

Again, the root cause is likely to be some combination of hard-to-fathom incompetence, simple clueless-ness and actual, anti-customer changes made by someone who thinks they can advance in their career by making things worse for customers. Sadly, this is all too often how organizations, large and small, actually work … unless you take constructive steps like this to make it otherwise.
Software Project Management: “Releases” are Stupid and Out-moded
Tell me, do you use Google by any chance? Yes? OK, now tell me which release of Google you use now, which ones you've used recently, and which one you like best; and, by the way, how was the upgrade process? Oh, you don't know? There's no way to find out, you say?

What the heck is WRONG with those people at Google? Don't they know that modern software development processes demand a disciplined release and planning methodology? There is NO WAY customers will put up with a product when the vendor just shoves new releases at them any time they feel like it, without even proper notification! And what customer is going to sign up with a vendor who won't commit to a future roadmap, with at least a year's worth of features laid out, tied to hard-commit release dates? That Google, they violate every rule of the game, there's no way they're going to make it as a software/service company!

What, how can it be! NOOOOoooooo…. Google is the world's most valuable software company!!?? When they don't even have the most basic element of proper software methodology, releases?? I must be sleeping! This must be a nightmare!!

Get over it!

Here are the facts:
- Classic project management has been a disaster for software development.
- All the heavy-weight process things that people do to make things better … invariably makes them worse!
- The classic big-bang release is the cornerstone of the temple of evil.
- The classic little-bang release (a.k.a. "agile") is a brick in the temple of evil.
- "Releases" are … stupid! … and not only that, they're … outmoded!
There is life after "releases." It is a better life. The software is better. The users are happier. Go there. Enjoy it.
HDD Capacity Improves While Slowing Down. Help!

Here's a typical hard disk drive (HDD):

This is where your data spends most of its time.

The amount of data you can put on each HDD has gotten exponentially better over time. For example, this chart (all credits: Wikipedia):

shows hard drive capacity in GB over time.

Back in the mid-1980's, HDD's were sized about 10MB. By the mid-1990's they were up to about 1GB, an increase of 100 times. Now they're up around 1TB, an additional increase of 1,000 times. Amazing, particularly when you consider that the average HDD was also shrinking in physical size over that time:

Right: an older 5 1/4" HDD with 110MB; left: a 2 1/2" HDD with 6.5GB.

Suppose the size of your main customer database is 1TB. In the mid-1990's, it would have required about 1,000 HDD's to store the data, and today you can put it all on a single HDD: it's wonderful, yes? Well, maybe not. The problem is using your data.

The problem is simple to understand. Here:

is the inside of a HDD. Your data is stored on platters. It's read and written by a head, which is at the end of the long arm that ends near the center of the platter. In order to read any piece of data, the arm has to position the head on the right track of the platter (i.e., move it towards the edge or towards the center of the platter), and then has to wait until the spinning platter brings the data to the head.

Here's the killer: these times (called seek times) haven't improved much in the last 25 years! In the mid-1980's they were around 20ms; today, most HDD's are around 10-15MS, and the very fastest are around 3ms, around a 7X improvement at best.

These trends (more data in less space, and unchanging seek time) are likely to continue.

Improvements in HDD's since 1980's:

Speed 7X
Capacity 100,000X

The bad news isn't over yet. Remember how your customer data used to take 1,000 HDD's? That meant that you had 1,000 HDD's all available to read and write your data. Now you've got just one — and it's hardly any faster than HDD's were 25 years ago!

What does this mean? If you want to not just keep your data, but also access and update it, you've got a big problem today, and it's getting worse.

How are people responding to this situation? Simple: they are turning to SSD's. SSD's solve the speed problem! But, as provided by most vendors, SSD's introduce a whole host of new problems. Today, storage buyers confront an extremely unpleasant choice: buy extremely expensive SSD's that lack fundamental storage features, or buy affordable HDD's that simply aren't fast enough. Yuck.

There is an alternative. It's a good one. It combines everything you like about HDD's with the speed of SSD's in a novel hybrid combination. I can't stop thinking about about it. Check out the Xiotech hybrid ISE.
Xiotech’s Hybrid ISE and Storage Performance
Everything gets better with computers, every year. For the same price or less we can get a bigger screen, a faster processor, more memory, more storage, a faster connection, a lighter device, or a combination of the above. We are so used to this, we don't question it or think about it. It's just the way things are.

HOWEVER, there is a BIG, FAT exception to this generally wonderful trend. The exception is something most consumers don't think about, but it's having an increasingly dramatic impact on the world of IT professionals. In a world of increasing expectations, the thing that is getting inexorably WORSE every year is: storage performance.

Worsening Storage Performance: the Market Speaks

It's pretty easy to tell that storage performance is going to the dogs by noticing the following:
- The hottest subject in the storage world is solid state disk (SSD). What is SSD? In a nutshell, it is a new version of storage that is WAY more expensive than regular storage. Since when do people get excited about something that's more expensive than everything else? Simple: it's faster. A lot faster. If people didn't have a speed problem, they wouldn't pay a micro-second's attention to this expensive new form of storage.
- One of the hottest companies in storage today is Fusion IO, which delivers SSD in a server card. Of course Fusion IO's products are incredibly expensive (you already knew that). You have to open up your server to plug it in. You have to change your application to take advantage of it. When the server breaks, the storage is inaccessible. But it's fast! Fusion IO's story could only sell to buyers who are desperate for performance. There must be a lot of them out there.
- Data center managers are busily virtualizing their servers, and getting major cost and managability benefits from running applications on a smaller number of physical servers. But they are avoiding virtualizing their most important, mission-critical applications. Why? Because running applications on a smaller number of servers concentrates their demands for storage, making a really bad problem even worse.
- Traditionally, the unit of measure for storage is (of course) how much it stores, the number of GB or TB it holds. But buyers are increasingly buying more capacity than they want in order to get the performance that they need. There is talk of "short stroking" and "over-provisioning." Performance used to be a given in storage; now you have to really pay attention.
From the way people are acting and the market is evolving, it is safe to conclude that, contrary to everything else in the world of computing, storage performance is swirling its way down the toilet.

Worsening Storage Performance: the Fundamentals

In a world of constant improvement, why is storage performance alone the stand-out? The reasons are simple:
- While each individual disk holds more data than it did years ago, its ability to access the data has only improved a little. It's as though storage rooms were doubling in size every year, but the designers kept putting the same dinky one-person-at-a-time doors on them. Sure, any one room stores more and more stuff, but you can't put stuff in or take stuff out any faster than before, so what's the point? Every time the room gets bigger but the door stays the same, the storage performance problem gets worse.
- While servers have advanced to nicely scalable blade formats, storage systems continue to be designed as huge monolithic, mainframe-like behemoths. It's easy to grow the capacity of mainframe-like storage systems, but what you'd really like is a blade format, so that when you added capacity, you added performance so you could actually use the added storage.
Worsening Storage Performance: the Holy Grail

What would solve the problem? How about:
- A blade format for storage, so that when you added capacity, you added the performance required to actually use it.
- A seamless hybrid of SSD and traditional storage, so that you could get the capacity and the performance you need at a price you can afford.
- A real storage format and interface (unlike a server board) so you can just plug it in and go.
- A real set of storage features like (replication and fail-over) so you're not taking a step backwards.
- While we're dreaming here, why not throw in dramatically better reliability, density, power consumption and managability than any product on the market?
OK, storage gurus, those are your specs. That's what the market would really like to have.

Worsening Storage Performance: The Holy Grail is the Xiotech Hybrid ISE!

There's nothing more to say here. There is a problem. Everyone in the industry knows it. It shows in market trends. It makes sense in terms of fundamentals. What a great solution would be is obvious. The only remaining questions should be are: does the hybrid ISE exist? Does it meet the requirements listed above?

Yes, the hybrid ISE meets the above description. The hybrid ISE is currently in limited engineering release, and is being shipped to the companies at the head of a rapidly growing list of early-adopter customers. It's built on the solid foundation of thousands of ISE blades already in the field. It contains dozens of innovations to make it all work the way customers want it to work: simple, fast, and affordable.
- How simple? It plugs into the same plug storage normally plugs into. No changes to applications or anything else required.
- How fast? Depending on the application, 4 to 10 times faster than normal spinning storage. That is a VERY large fraction of the speed of SSD-only solutions, plenty of speed for most needs.
- How affordable? Roughly a 1/3 premium over pure spinning storage. VERY affordable.
Yes, I'm biased, as I have disclosed before. But the "bias" comes from insider information, and I can tell you that this is a rocketship in lift-off mode.
What Do Consumers Want from a Web Site?

Why is this question important? Because whatever it is that consumers want in a web site had better be exactly what you want (and create) in that web site! Assuming, of course, that you care about success…

All of what I'm about to say is obvious, but you'd never know it from looking at web sites and from looking at the priorities of web developers, both of which I do quite a bit of in the course of my work. Everything I'm saying here applies equally to SaaS operations, BTW. So from most important on down:

Not Broken

I've written about this again and again. If a consumer tries to go to your site and it isn't there or it's otherwise obviously broken, the consumer's reaction is roughly the same as if they went to a new store and found it locked and shuttered.

Do you think they're likely to check again later when you've fixed the bug or gotten the right release level of the OS onto your servers? Is there anything more important than having your "store" ready and open for business when some arrives?

Not Slow

I'm always amazed at how little attention insiders pay to this crucial issue.

What ……………. if ………………. it's ……………. just ……………… a …………… little …………… bit …………….. slow. ………….. How …………… bad ……………….. could …………….. that ……………….. be? Well, ask yourself — how did you enjoy reading those last two sentences? Did it make you easger to read more of the same? What ……………………………………………. if ……………………………………. it ……………………………………………………….. were ………………………………………………. even ……………………………………. worse? Would you come back to any site where the whole doggone thing were that way, all the time?

Conclusion: even if you can't make it fast, make sure your site isn't slow.

Attractive

We're on a roll. The place is alive, open for business, and responsive. A great start! Now — at a glance — do I want to be here? Do you want to get with this? Or do you want to get with that?

People will tell at a glance whether they like it or not. If they don't like it, they're most likely gone.

Easy to get where you want to go

Far too many web sites are, simply put, navigation hell.

Not on purpose, of course. The worst examples seem to result from someone wanting to "organize" the site into some system that "makes sense." Sure. As though anyone cares about your page and folder hierarchy! The only thing that most users care about is getting to what they want with a minimum of mystery. Does having an "organized" web site help this? Generally speaking, it does not. The only thing that helps is … looking at things from the user's point of view. What an idea!Maybe if you do this you can avoid the web equivalent of:

Minimum clicks to do what you want

Every click, every keystroke that stands between a user and his goal is an opportunity for that user to decide that it's just not worth the trouble. "How much trouble can a couple clicks and keystrokes possibly be?" you may ask yourself, as you madly set things up so that more of them are required, for reasons that are no doubt compelling. The answer is simple: every keystroke or click is an opportunity for the user to bail out of the process before the goal you want them to reach has been attained. Is that clear enough? Any click that can possibly be eliminated is a click that should be eliminated. Period.

What Do YOU Want from a Web Site?

I hope it's exactly what consumers want. Everything else is details. If you screw up even one of the above, you're swimming upstream. If you get the order of the above wrong, you're missing the boat. If you fail to pay attention, I hope everything else in your life is really great, because the part involved with web sites won't be, and even more metaphors won't help you.
Experience and Youth

I don't like to talk about this subject, because it's one of those almost-always-lose topics, not unlike politics or religion. But unlike politics and religion, it comes up ALL THE %^&*%^ TIME in the computer-centric circles I travel in. And it has impact (in a bad way) all too frequently.

Here's what got me going on this today:

Credit: Dilbert

I can relate to this cartoon. When I was younger than most of the people I dealt with, I frequently encountered ignorant older people who weren't great when they were younger, but now they're old and even more out of it, and desperate to maintain control and authority. That was the pattern. As I've gotten older myself, I now see people substantially younger than I am playing the role of the old guy in Dilbert's cartoon to perfection. What's worse is when they're in positions of power, and they just get their way because of the power, without even needing to wave around modems in threatening ways.

On the other hand, there is no such thing (in reality) as six-month-old technology that simply springs out of thin air. All the stuff that seems new is in fact a relatively minor extension of concepts and components that have been around for a long time, in some cases reviving things that were invented decades ago, but the time wasn't quite right for them. The "kids" usually don't know that. They have no history, no real experience, and they work in a field in which those things are not valued. Every once in a while one of the cool new things that seems like it's going to take over the world really does, but all too often the lemming cartoon above applies. The kids could benefit from a bit of perspective.

Does the old person want to hear it? Generally, no. Does the young person want to hear it? Well, sometimes they'll listen politely and hope the noise stops soon.

I wish I had a trick or technique to solve this problem, but sadly I don't. I just try.
Communicating Technology: the SMiT example

I love my job.

I was in Beijing, China a couple weeks ago talking with an incredibly fast-growing internet ad network, YoYi Media. These guys are doing great things; they'll succeed whether we end up partnering with them or not. While on-site, I got a chance to talk with Liya Wang, who was helping us "on loan" for a couple days from her main responsibilities at an Oak company I've had nothing to do with until now: Shenzhen State Micro Technology Co, SMiT.

Liya is a finance person working on primarily financial issues with SMiT, and as such, she needs to explain to bankers and investors what SMIT does; understandably, people kind of want to know what you do prior to signing the check. As a start, she tried to get me to understand what they did. Fortunately, given my background, it wasn't too hard. But I could appreciate her trouble. The company essentially makes complicated chips that do complicated things that help make digital TV better. It is difficult to even start talking about what SMIT does without plunging into acronyms that are no more understandable when spelled out.

This is exactly the point at which … the CTO should jump in and solve the problem!

… Wait, wait, don't you mean the fluffy, slick marketing people? Don't you mean the branding people? How can you possibly mean the CTO, who is the leader of the nerds who create the communication problem in the first place?! I mean, the CTO is the head of the incomprehensibility crowd, the lead creator of obscure acronyms, the guy who says things like "if I tried to explain it, it would just scramble your brain."

The CTO you should have, the CTO you should aspire to be, is exactly the right person to explain the problem, because he is the one who is most likely to actually understand the technology that needs to be explained! It's that simple.

Liya and I took a first pass at coming up with a way of explaining what SMIT does that does not involve scrambling of brains or twisting of tongues. She personally used Powerpoint to take a first cut at putting it into slides. I thought she did a great job, so I'm going to embarass her by taking some of images from the builds of just one of her slides that explains the core idea quite nicely. (I'm sure it can be improved, but this is a quick turn-around first cut!)

Starting, of course, with the problem: here's the confused user who would like to watch cable or satellite TV, but instead of just a TV and a remote, has an extra box, wires, things to hook up and several remotes.

So what SMiT do about this messy problem? The cable/satellite box disappears and there is a little card.

Then you see the card moving so that it's inserted in the TV — in fact, it can be pre-inserted.

Once the card is in the TV, all the wires are gone, and you see the multiple remote controls turn into just one control that enables you to do everything:

The user of course is happy about this, and so gets a nice green check mark. But for the verbal learners in the crowd, it's worth spelling out the benefits:

This simple sequence of animations shows anyone what SMiT does and why it's good. Thanks, Liya!

Sequences like this are usually painfully obvious once they've been created — it seems like they've always existed. But having been around many times when there is no such simple way of conveying the essence of a complex technology, I know how hard it is to do, and how important it is to get it right: to be simple without being simplistic.

While this is not in the typical CTO job description, it is actually one of the most important things a CTO can do, and no one does it better than the best CTO's.

If you're a CTO or on the path to be one, I suggest that this is an incredibly important exercise to undertake. It actually helps you to get perspective about what you're doing (and why you're doing it!) to be "forced" to explain it to non-technical people. It gets you "outside" of yourself in a way that is often useful.
From Start-up to Success: Costs and Planning
Your start-up has started up — the engine is cranking, revenues are substantial and growing, things are working. It's exciting! Now it's time to conduct some wild experiments and screw things up royally!!…

… no, sorry, I don't know what came over me there … really, now is the time to plan and measure our new initiatives. We don't want to screw things up; so we'll put a process in place, and we'll only approve projects that meet all our criteria, including budgeting, cost and revenue projections, etc. This is the perfect complement to the general growing-up process we've started.

It's hard to argue against this logic. It sounds like you're being impulsive and childish if you try. Doesn't it make sense to grow up?

Yes, growing up is a good idea. But the question is, what kind of adult are you going to be?

Should your organization mimic the "best practices" of the slothful old organizations whose cluelessness gave you the opening to start, grow and thrive?

Strangely enoughly, that's exactly what vigorous organizations tend to do when they try to "grow up." I used to find it shocking, but now I just expect it. The net effect is simply awful. The practices they attempt to mimic are, after all, bad practices — and the aspiring organization typically executes them badly. So what's that? Bad squared?

There are ways to "grow up" and avoid disaster without losing your soul. I'll just mention a couple of the key ideas.

Cost tracking

Yes, cost accounting. Sounds old-fashioned, doesn't it? But time after time, when organizations find out how much money they spent last month on exactly what, there is shock and awe: you mean I spent that HUGE sum on THAT??; and one of our supposedly most important initiatives got THAT paltry effort?? Actually knowing where the money is going often has an amazingly salutary effect. Simple but effective.

Planning: broad goals, lots of feedback, iterations

An effective planning process in an agile organization does not lay out what you're going to get, when you're going to get it and exactly how much it will cost before the "real" work starts. In fact, as I've been known to say, dates are evil. An effective, light-weight planning process is one that:
- Has broad goals. The goals include measures of success so we can tell how we're progressing, but a minimum of predictions. Instead of predictions, we have the simple edict: as fast, well and inexpensively as possible. And success isn't measured by how good people feel about the effort, but by the numbers.
- Incorporates lots of feedback. The feedback should be from both inside and outside the organization. If possible, it should be a combination of asking people what they think and having trials so you can see what they do. The feedback should be continuous, and should apply to every step of the effort.
- Has lots of small iterations. This is the key to replacing the old heavy-weight planning process. Mistakes are catastrophic if they are long and expensive. If they are short and cheap, they are simply part of how you learn. You don't have to be perfect if you have loads of small trials, which means the trials can be quick. It's not like a slow march across an open field; it's like running the ball in a sport, in which success is a combination of speed and effective reaction to changing conditions and sudden challenges.
Growing with Cost Control and Real-time Planning

It is important to grow up in business and get past the startup phase. But what do you grow into exactly? Knowing where your money is going and having a planning process that emphasizes speed, agility and feedback are two critical factors in becoming a vigorous adult organization, and avoiding the sclerosis that can so easily lead to an early demise.
Software Project Management: Dates are Evil

The problem with estimates and dates in software is simple: nearly everyone wants them, and having target dates usually makes things worse. Worse! Are there exceptions? Yes – but not many!

I’ve written agonizingly loooooong papers on this subject. But this is just a blog, so here is a bite-size thought to chew on.

Software estimates of dates might be a good thing if the thing we were estimating were something we have done over and over again, and had reasonable standards to apply. But we don’t.

Think of things like your commute to work, and the commutes of other people to the same place. Say you tell a friend your commute this morning took half an hour. Your friend will immediately be able to react to that information by saying, “whoa, was there an accident?” or “how come they didn’t catch you, lead-foot?” Your friend has a clear idea of the range of reasonable times for a “normal” commute, and can use that knowledge to judge on his own whether the one this morning was fast or slow, and by a lot or a little.

Now suppose a friend tells you that he's just finished designing a database, and that it took a week. Lots of people have designed databases. Say you’ve designed lots on your own. How do you respond? Is your friend a superman, for designing the database in record time, or a slug? The fact is, without knowing a whole lot about the details of the database to be designed, you have no idea. Even if your friend tells you some basics like the number of tables and columns, you still really have no idea. Was there a starting model? Are there special performance, size, interface or compatibility requirements? Were there a bunch of conflicting interests to account for? Depending on lots of details like these, even knowing the “raw” size of the database schema that was designed, you still have no way of judging.

Lots of interesting things come from this observation. One of them was taken up and exploited by the big services firms long ago. Here it is: the people who look at your project plan typically don’t have a clue – not even a foggy idea – of whether the times and resources assigned are reasonable. What the manager typically wants is confidence that, whatever resource and time commitments may be made, they are kept. So project management, in this context, is all about having lots and lots of highly technical looking stuff with charts and diagrams and fancy reports to justify what are – all too often — completely outrageous resource and time commitments. No one looking at them has any way of knowing that they’re outrageous. So they get approved. The expectations are eventually met. And everyone’s a hero!

Speed vs. Predictability on the Race Track

Imagine we’re on a race track and there are some people running the 100 meter. Most of the people are outstanding runners, so they’re always competing with each other. In every race, all but one of them is a loser, and they tend to set aggressive goals of the times they’re going to make, which they frequently miss by a little.

The typical project manager would look at them as a bunch of unpredictable losers! They usually miss their goals, after all, and they practically always lose the contest they enter!

Now someone sidles up to the disappointed project man and says that he’s a different breed. “When I say I’m going to do something, I do it” is the line. So our runner says, see that 100 meter track over there? I’m going to go up to it within the next 15 minutes, and I’m going to tell you when I’m starting. You time me. I’m going to cross the finish line with 60 seconds. And I’m going to try to exceed your expectations.

Naturally, the project man is impressed. This is what he’s been looking for, a winner who does what he says he’s going to do.

Ten minutes later, the guy goes to the starting line, announces his start, and 55 seconds or so later, crosses the finish line, ahead of schedule! Wow! He started ahead of time and took even less time than he told me – this guy is a star!

Speed or Predictability: You Choose

What this boils down to is that you’ve got a choice. Which do you want: speed or predictability?

If you want predictability, dates are your friend. By all means, set a 60 second target for the hundred meter dash, and consistently beat your estimate. You’ll look like a hero to the ignorant, and you’ll do your part in flushing your business down the toilet.

If on the other hand you want speed, you will avoid setting date targets like the evil, mutilating plague they are. You will clearly define the target a hundred meters from here, and encourage your swift runners to start before the words get out of your mouth, and arrive yesterday. You may cause a bit of confusion among business people to whom this way of working seems strange and new, but they’ll quickly adapt as they see high spirits, pride of accomplishment … and results, the like of which they’ve never seen before.
Databases and Applications

When databases were invented, they solved a huge problem that couldn't be solved any other way. Anyone who cares to look can see that the original problem that caused us computer programmers to invent databases has largely gone away. So why is it exactly that application programmers reflexively put their data in a database? In a surprisingly wide range of cases, it sure isn't because of necessity. Could it, perhaps, perhaps, be nothing but habit and the little-discussed fact that change happens in software at roughly the same rate that change happens in glaciers?

From the beginning of (computer) time, instructions have needed to be in memory to be executed, and data has needed to be in memory to be operated on by instructions. The memory in which instructions execute and fiddle with data has always been way faster and way more expensive than the large, slow but cheap places they are put when they're not in memory (call it storage, whether the storage is punch cards or tape or disk). It was this way at the beginning of time and it's true now.

Think of memory as your work table. Eons ago, your work table was really tiny, like this:

You can hardly fit anything at all on it! So you'd better have a really big storage place to keep all your stuff, like a pantry:

OK, that's cool. You've got all your stuff in storage, but you can only work on when it's in memory (on the table). What do you need? You need to get the stuff you want to work on now from the pantry, and you need to put the stuff you're done with back in the pantry. In other words (if you're in the world of computers)…you need a database!

The very most basic function of a database is pretty simple: its job is to shuffle your data between memory and disk. It's also nice if it keeps everything straight, avoids dropping bits on the floor, and cleans everything up when something goes wrong during the shuffling.

That was then. But things have changed. Remember Moore's Law? The amount of memory available to us at surprisingly reasonable prices has grown hugely. Exponentially. Our work tables now look more like this:

And our pantries? Well, they've grown a bit too:

So how much data do you have? Run the numbers. It goes without saying that it's going to fit in storage. But how about that work table? Here's the question you have to think about:

Will all your data fit on the work table (i.e., in memory)?

With memory available at reasonable prices for 64GB and even 256GB, the answer is often YES! It will!

Hmmmm. What was it the database does? If all my data fits in memory, why was it I needed that database???

I know, I know. Databases can be nice for reporting and data analysis and "persistence" and a few other things. I'm not saying you never use them. But for your real application, the one that takes user requests and responds to them, if you don't need to have the database shuttling stuff between the work table and the pantry… Hmmmm.