The Black Liszt

Category: CTO

Summary: Software People and Management

Guess what – software people aren’t like regular people! There are differences in the ways they think and act, judging results can be difficult, and hiring and managing is challenging. This is a summary of my writing on the subject.

Skills, Training and Status

The education establishment has taken over the world of skilled jobs. They've convinced the world that getting a degree in Computer Science is the only way to learn how to be a computer programmer. They are wrong.

https://blackliszt.com/2025/01/how-to-learn-to-be-a-computer-programmer.html

When you get a degree in CS or learn a programming language, you’re all set, right? Actually, those things are pretty much the same as knowing how to hammer a nail when building a house. It’s a nice start, but there are levels and levels of skill beyond the basics.

https://blackliszt.com/2018/09/the-hierarchy-of-software-skills.html

There is perennial conflict between young and old in software, perhaps more than other fields.

https://blackliszt.com/2010/12/experience-and-youth.html

You might think that getting a degree in Computer Science prepares you to be an effective programmer, much like getting an MD prepares you to be an effective doctor. A comparison of the two makes it clear that CS education is sadly deficient.

https://blackliszt.com/2020/07/job-requirements-for-software-engineers-should-stop-requiring-cs.html

Does status in software correlate with your skills, kind of like how hitting more home runs gives you status in baseball? Sadly, no. One aspect of software status is that the more layers there are between you and real customers, the more status you have.

https://blackliszt.com/2011/10/status-in-software-silliness-and-stupidity.html

There are other dimensions, for example the more management layers between you and fingers-on-keyboard programming, the greater the pay and status. And there are others!

https://blackliszt.com/2018/10/the-hierarchy-of-software-status.html

There is a similar skills and status hierarchy in data science

https://blackliszt.com/2019/03/the-hierarchy-of-software-skills-data-science.html

One of the sad things about software is that writing great code and getting credit for it are largely unrelated to each other.

https://blackliszt.com/2020/10/elizebeth-smith-friedman-the-cancelled-heroine-of-cryptography.html

Programming skill and power in the programmer is far more important than the differences between programming environments.

https://blackliszt.com/2010/05/what-is-the-best-programming-environment.html

At least in some environments, high-performing nerds are highly valued.

https://blackliszt.com/2011/07/the-new-top-gun-is-top-nerd.html

Understanding Programmers

When you think about hard-core programmers, the word "nerd" comes to mind. In the poplar imagination, "nerd" often seems to mean guys who play lots of games. And have attitude. Of course there are such people. But the very best programmers, in my experience, don't match that description.

https://blackliszt.com/2012/02/most-nerds-are-introverts.html

Some programmer-nerds go beyond introversion.

https://blackliszt.com/2012/03/nerds-autism-deficiency-advantage.html

Many of the best programmers rub "normal" people the wrong way. They're not "team players," They refuse to follow industry-standard norms, etc.On the other hand, what's so great about being "normal?"

https://blackliszt.com/2025/07/whats-so-great-about-being-normal.html

Temple Grandin's views are always valuable.

https://blackliszt.com/2010/02/nerds-and-the-yakkityyaks.html

In the end, the best nerds like to get stuff done — and have fun.

https://blackliszt.com/2011/07/top-nerd-activities-work-hard-save-the-day-and-have-fun.html

https://blackliszt.com/2010/03/nerds-in-norway.html

https://blackliszt.com/2011/07/top-nerd-nerd-values-and-attitudes.html

And of course, normal people can react to a passionate desire to get things done in the best way possible as arrogance.

https://blackliszt.com/2012/02/when-you-call-the-programmer-arrogant-are-you-committing-libel.html

Hiring

Hiring software people is a challenge. There is widespread bad advice that interviewing for a software job should involve lots of chatting and relationship building.

https://blackliszt.com/2017/02/how-to-get-a-software-job.html

There are definitely better ways to select software people than the normal interview process.

https://blackliszt.com/2011/12/interviewing-software-people.html

The CTO of an organization is a special case. The requirements for the job are frequently impossible to meet.

https://blackliszt.com/2014/10/hiring-a-cto-the-impossible-dream.html

While CTO’s should be tops in technology, the best of them understand finances.

https://blackliszt.com/2012/10/cto-cfo-cfbco.html

Managing

Managing software people isn’t easy. Most managers tend to focus on process instead of substance.

https://blackliszt.com/2013/03/process-and-substance-in-software-development.html

https://blackliszt.com/2013/11/people-and-substance-in-software-management.html

Many companies have what's called a software architect. What should that person do?

https://blackliszt.com/2010/04/the-chief-architects-role-in-a-tech-company.html

If instead of just paying attention to abstract process managers paid attention to what the programmers are actually doing, things would improve.

https://blackliszt.com/2010/04/what-are-all-the-programmers-doing.html

One of the reasons software managers tend to focus on process instead of substance is that they have zero experience building software. Worse, it's literally invisible to them.

https://blackliszt.com/2016/12/managing-what-you-cant-see.html

As you can see from Dilbert and coal mining, you do a whole lot better managing something you've done or at least can see.

https://blackliszt.com/2017/04/managing-software-thats-invisible-to-you.html

Non-technical managers naturally want to find a way to feel better about what they're doing. Here are some of the favorite methods.

https://blackliszt.com/2015/06/software-problems.html

All too often, nontechnical managers make important decisions instead of software people.

https://blackliszt.com/2014/01/who-makes-the-software-decisions.html

When non-technical people do all the managing, the programmers who know stuff either leave or learn to keep their mouths shut in the face of stupidity and lie.

https://blackliszt.com/2024/06/well-trained-managers-create-software-failure.html

In other fields, it’s mostly people who DO the highly skilled thing who become managers of it. It should be the same in software.

https://blackliszt.com/2014/10/joe-torre-and-software-development.html

Non-technical managers think that MBA skills are the main thing that's required to be a good manager. Wrong.

https://blackliszt.com/2016/01/software-business-and-business-school.html

Here's how I learned from a course at Harvard Business School that the MBA teaches confident ignorance in terms of software.

https://blackliszt.com/2023/01/software-management-and-business-school.html

So with all this, should a skilled software aspire to be a manager? Dilbert makes an excellent argument.

https://blackliszt.com/2023/02/should-a-software-engineer-aspire-to-become-a-manager.html

It’s easy to focus on relationships instead of technical substance when managing.

https://blackliszt.com/2017/02/software-management-and-relationships.html

There is also the classic case of ego. It's been a big problem in other technology-based fields, and it's a big problem in software.

https://blackliszt.com/2012/07/a-lesson-from-joseph-lister-ego-the-killer-of-software-projects.html

https://blackliszt.com/2012/07/what-can-software-learn-from-steamboats-and-antiseptic-surgery.html

The way group management changes as the group grows can be destructive.

https://blackliszt.com/2011/05/how-great-software-teams-can-go-wrong.html

It’s clear from the pay scales that most companies value management much more highly than the people who write code. Unlike baseball, for example, where the people who play on the field get the highest pay.

https://blackliszt.com/2016/04/software-people-you-get-what-you-pay-for.html

Does having a CS degree lead to the best pay? Not as much as you’d think.

https://blackliszt.com/2015/05/how-much-is-a-computer-science-degree-worth.html

There are many ways otherwise good programmers can go wrong. One big one is the result of perverse incentives.

https://blackliszt.com/2015/04/software-problems-the-role-of-incentives.html

Programmers who are considered to be “stars” often have issues unique to software that hurt group efforts and are rarely recognized.

https://blackliszt.com/2015/04/high-iq-programmers-the-problems.html

On the other hand, the whole process of training and promoting the most talented programmers is broken. It can learn a lot from ballet.

https://blackliszt.com/2013/04/what-can-software-learn-from-ballet.html

Book

The above are highlights of what there is to learn about software people. For more, see my book.

https://blackliszt.com/2015/03/software-people-book-just-published.html

May 17, 2023
Hiring a CTO: the Impossible Dream
I've been a CTO several times. I've worked with many CTO's. Last time I checked, every one of them was a natural-born human being. No cyborgs, no alien intelligences controlling a homo sapiens body. So why is it when most organizations try to hire a CTO, their requirements clearly demand capabilities and accomplishments that no human being could ever achieve?

The Impossible Dream

I've seen a whole pile of CTO job specifications over the years. They are remarkably similar. If I were in the executive recruiting business, I'd be embarassed to deliver my "custom-built" CTO job requirements document, knowing that all I've really done is some light editing from previous efforts. The word "plagarism" somehow comes to mind.

But that's chicken feed compared to the blue whale in the room — this is a way bigger issue than a mere elephant in a room. It's the fact that no actual, living human being born of human parents (to rule out the alien connection) could ever possibly satisfy the requirements! The "requirements" aren't requirements — they're an impossible dream.

The Impossible Dreamer

Here are the first couple of verses from the song, the central song in the 1965 musical Man of La Mancha

The song and the musical are about Don Quixote, hero of one of the greatest and most influential novels of all time. Don Quixote and his faithful companion Sancho Panza go off on a quest to revive chivalry and do good works.

One of the most famous episodes is Don Quixote going after the windmill, which he imagines to be a giant. Things don't go well.

This is the origin of our phrase "tilting at windmills," in which you are taking on an impossible task.

Impossible CTO Requirements

Part of the trick in writing these requirements is to make it not seem to be an impossible task. I admire the clever writing needed to slide the impossibilities past the average reader, who, frankly, pays little attention to the wording.

The requirements amount to something like the following list:
- The successful candidate will be over 7 feet tall and be able to successfully pass under a limbo pole exactly 2 feet high without touching the ground.
- The successful candidate should be able to crank out bug-free code at the rate of hundreds of lines per hour, while being an empathetic, nuturing leader of technical talent.
- The successful candidate should have a proven track record of hiring technical talent that dresses in board-of-directors-friendly "business casual," works during normal working hours at the designated location, and is consistently cheerful and friendly.
- The successful candidate will have demonstrated the ability to be a self-starter, while at the same time executing management directives enthusiastically and without deviation or failure.
Have I exaggerated? No! I've only slightly edited actual requirements that I've seen in order to protect the guilty.

Conclusion

Trying to hire an in-your-dreams-only CTO is business-as-usual in my experience. I try not to get upset about it. And in-your-dreams is not even the biggest problem in CTO hiring; even worse is making requirements that actively filter out the most promising candidates! This madness is yet another side effect of people without a technical bone in their bodies thinking they can manage it, because after all, technology is just another thing to manage. This absurdity is so widely accepted in management circles that even solid examples of how ridiculous it would be to act this way in other fields just whittle away at the deep-set conviction.

Postscript

I know, I know, I'm being mean. The recruiters are, after all, doing the best they can with what they have to work with. They won't get very far with their customers (the companies doing the hiring) if they say "that's a stupid requirement, I won't put it in." And in the end, they're measured on the success of the person who eventually gets the job, which is way beyond a document — which people don't take terribly seriously anyway.

But there's a serious point here anyway: the document could be meaningful and helpful if it specified something that could be satisfied by something less than a celestial being, and if people could get clear about avoiding "there are at least 15 top priority items here — yes, they're all equally important!"
October 29, 2014
People and Substance in Software Management
What's the right focus for managing a bunch of software engineers? Should you focus first on assuring that you have a congenial team with good working relationships, or should you focus first on assuring the substance of what they're creating is the best it can be? This is a similar question to the choice of process and substance that I covered in a previous post and it's just as important.

Substance vs. Process and People

Just as schools of education purport to teach people how to educate without reference to the subject that is being educated (math, history, etc.), management schools (you know, the kind that give MBA's) purport to teach people how to manage without reference to the kind of work that is being done or the kind of people who are being managed. The idea that "good management" and "management skills" are entirely distinct from the actual work being done pervades our work environment. The more esoteric the work skills, and thus the more difficult it is for a manager to have a clue about what his people are doing, the more this seems to be the case. Software development is right up there with particle physics in its ability to induce glassy eyes among the non-initiated.

Substance vs. People

It's really tough if you're a manager and you're trying to manage something when you have no clue what the people doing the work are doing. This is what leads to the strong tendency for anyone who claims to be a good manager to focus on process instead of substance, and to focus on people and human relationships rather than substance. Truth be told, if you don't have a clue about what people are doing or talking about, you don't have a lot of choice. A normal human being observing an exchange between two engineers can easily discern that engineer A is being nasty and disrespectful to engineer B, but how can that normal human being see that engineer B is proposing a completely stupid approach that will take a lot of time and then fail, while engineer A is being quite restrained (relatively speaking) in his reaction to the uneducated drivel drooling out of engineer B's mouth? The manager is likely to react to the only thing he actually sees, i.e., the disrespect being given by engineer A to engineer B, and therefore chastise engineer A, when actually it's engineer B who should be sent back to the farm teams, if not worse.

I am not claiming that there is a binary choice between people and substance, that if you focus on one you can't pay attention to the other. Of course, both are important. But as I look at software shops, I frequently see managers treating one of these as the primary goal, while figuring that the other will mostly take care of itself if they get the primary one right. The less they know about the substance, the more likely this is to be the case, but even experienced engineers who are trying to make the transition to management get a clear message on this subject from their superiors: good management is independent of substance, and a "good manager" makes sure above all that the human relationships in his group are in good shape, and that if they are, good software will naturally be the result.

Management focus and company success

In the best of all possible worlds, a manager would create harmony and excellent relationships among the programmers, while at the same time achieving maximum productivity towards making the company successful. That's a nice fantasy. But that's all it is. While being nasty doesn't imply great programming any more than being misunderstood implies that you're a genius, the fact is that there's a strong correlation between a focus on substance and company success.

The reason putting major emphasis on substance (i.e., "who's right" is more important than "who's more respectful of other opinions") is a winner is simple. The groups who all agree that writing the best possible software to meet the company's needs are more likely to … write the best possible software to meet the company's needs! Compare this to groups who all agree that having personal harmony among its members is more important than achieving the best possible software — doesn't it make sense that you'll make all sorts of compromises in software to optimize everyone's feelings? Oooo, better give everyone a medal, we don't want to risk diminishing anyone's self-esteem, do we?

Organizational Size

I've noticed that the larger the overall organization, the more that general, content-free management is the norm, and the more likely it is to be imposed on groups that develop software. This is completely understandable from a human psychology point of view. Everyone thinks that the skills and knowledge they have are the really important ones, and the others (which they don't have) pale in importance by comparison. The big bosses are likely to come from sales, finance or consulting and think that having an MBA is a good thing. Why should software be different from, say, running a store, or making deliveries? Or making tubes of toothpaste? If you're a … drum roll, please … manager, then that's what you do: manage. Your skill trumps all the others!

This is the dominant view of people who run organizations, or are on the management ladder. In order to advance, you have to adopt the organizational management-centric creed. Otherwise, you'll be "stuck" at the bottom of the management ladder, along with all the unwashed, unrewarded hordes who have to sit quietly with their heads down as various management fashions sweep through the grasping folks with sharp elbows who occupy the rungs of management ambition. If you even try to say "software doesn't work that way," all you've done is further disqualify yourself from advancement in rank.

If you've ever wondered why it is that small, under-funded organizations are nearly always the one who build ground-breaking software, and that giant, "well-managed" organizations with hundreds or thousands of programmers rarely do, this is your answer. "Good management practices" assure that good software rarely emerges from their large organization, and that outstanding programmers with superior ideas about how to do things are marginalized and otherwise made unproductive.

Software and Baseball

This may sound cynical and harsh. But think about, say, baseball. While it is tempting to think that all those beards

had something to do with the Red Sox winning the 2013 World Series, it is more likely that their outstanding pitching, fielding and hitting enabled them to defeat an excellent Cardinals team.

The 2012 season was awful for the Sox. It was their first losing season since 1997, and worst season since 1965. How did they respond? Did they update their project management methodology? Did they send in HR specialists to get everyone to be nicer to each other? Did they sideline people based on failing to high-five someone when they struck out with men on base? They did none of these things. Here's what they did:
- They fired the manager, even though he had a year left on his contract.
- They got rid of a bunch of players based on … their poor performance!
- They acquired a bunch of players based on … their superior performance!
- Finally they focused on their energies on, you know, … winning games! Doing whatever it takes!
Building good software is surprisingly like building a winning baseball team. Everyone does their job and puts everything they have into delivering great results. If someone isn't performing, too bad — you're not good enough to be a Red Sox! (Oh, just think how badly his feelings must hurt. How cruel!) Even so, sometimes someone is so awesomely good that it seems like he's carrying the team on his shoulders — and with Series MVP David Ortiz, you can make a strong argument that's exactly what he did with the excellent Sox.

Conclusion

People should be nice to each other. There should be mutual respect. But software is about results. The difference between average and great can easily be 10X for an individual, and 100X for an organization. You don't get "great" by focusing on process issues or good general management practices. You get "great" by … wait for it … focusing on software! Just like a musician focuses on the music and a painter focuses on the painting, the best software people focus on the software. The substance.
November 5, 2013
CTO + CFO = CFBCO
The CTO and the CFO aren't natural best friends in any organization. They are typically separated by a huge gulf of perspective; neither understands or appreciates what the other thinks or does. The best thing for any organization is when the two of them can truly take the other's perspective, and change what they do as a result.

CTO

What's the Chief Technology Officer (CTO) about? He or she had better be the best technical person in your organization. The one who understands the details, the big picture and everything in between. The one who actually understands all those computer acronyms, can sling them with the best, can pick the best and harness them for the good of your organization. At best, the CTO can rally the tech nerd employees to the cause and also convince the suits that everything is good.

CFO

What's the Chief Financial Officer (CFO) about? He or she had better be the best financial person in your organization.The one who understands every line of every statement and report, what's behind it, what led to it and where it's going. The one who understands how all that mass of detail relates to company tactics and strategy, and plays a key role it making them align. At best, the CFO can handle the big picture and the details, issues and people inside and outside the company, all to advance the company towards its goals.

CTO vs. CFO

When not at their absolute best, the CFO and CTO can have a chilly relationship.

The last thing the CTO wants is for some nosy bean-counter to mess with his stuff. Even simple questions are suspect: why is he asking? what's he looking to cut? Go away! Most CFO's seem like clueless idiots to even average CTO's. All they can possibly do is waste time and, through crass stupidity, make things even harder than they already are.

CFO's often see the whole tech group and the CTO in particular as being a bunch of whiney, spoiled, self-absorbed brats. They're anti-social, talk among themselves in their private language, get huffy or all-too-patient in response to even the simplest of common-sense questions, and seem perversely intent at avoiding anything that increases revenues or profits. They're perpetually late, reluctant to commit to anything, always complaining about lack of support, but still manage to have an attitude about pretty much everything.

CTO Fundamentals

Most of the thoughts I've had about computing that are worth anything took me years, often decades, longer than they should have to penetrate my thick skull. But one of the earliest realizations I had remains both true and rarely discussed: to the extent that computers are applied well, they cut jobs, and therefore (usually) costs.

This isn't the pretty way to put it. Most people prefer to think about how computers enhance people's efforts. And they do — meaning you can get the same job done with fewer people. Or you can deliver more with the same people — to deliver more without computers, you would have had to hire more people. Any way you cut it, the more widely and effectively computers are used, the fewer people you need to get a given job done.

Put all the gobble-de-gook aside, and what computers come down to is simple: cut costs, do more with less, get it done faster, etc.

Now what does this sound like? Could it, perhaps, sound like the kind of thing CFO's are supposed to worry about? Hmmmm….

The CFBCO

Maybe there's some middle ground here. At the heart of the matter, there is no skill set in an organization better suited to helping a CFO meet his goals than the CTO's. And when a CTO who is really good in nerd terms wakes up and realizes what his job is really about, there is no person better suited to be a company-maker than a bottom-line-oriented CTO.

Put in the most basic terms, a good CTO can make things happen:
- Faster
- Better, and
- Cheaper.
Faster, better, cheaper. That's computing in a nutshell, and when computing is applied to an organization to greatest effect, that's what happens to the organization. It does what it does faster, it does it better, and it does it cheaper. FBC. Wouldn't it be nice if we could have a Chief Faster-Better-Cheaper Officer?

Conclusion

In most organizations, there is a spectrum of leadership. At one end of the spectrum is the nerdy CTO. At the other end is the what's-your-handicap CFO. If you're very lucky and very deserving, maybe you have someone at the center of that spectrum who combines the best of both extremes in a single individual. The CFBCO. The CFBCO combines ultimate nerd-power with dollars-driven vision and insight, and makes the relevant numbers better in a way that even the dullest and most distracted board of directors can understand and appreciate.
October 23, 2012
Interviewing Software People

The methods in widespread use for interviewing and selecting software engineers are appalling. It is only because they are so bad that ridiculous methods like those often used at Google in which applicants are given trick puzzles to solve can seem like an improvement. The sad thing is, asking candidates to solve mental puzzles is better than what's usually done, which is not much at all. Come on, people — we can do better!

Typical Selection Practices

Managers need more programmers. They get a job requisition from finance, then go to HR to get candidates. Since HR knows nothing about the substance of the work that needs to get done, what ends up going into the job requirements are a bunch of motherhood-and-apple-pie blah-blah (self-starter, etc.) and a list of keywords of the technologies in which experience is required.

HR screens candidates based on whatever is in the resumes and may interview candidates, basically to see whether they mouth the expected platitudes when prompted by the HR people. Those who play the game are then passed to the programming department.

Typical Interview Practices

The candidate is typically scheduled for a round of interviews with programmers and managers. Since the managers rarely know much and have usually forgotten what little they used to know, they don't ask questions of substance; they basically find out if they like the candidate and if they believe the candidate will "fit in" and follow orders. The fellow programmers who interview remember their own interviews, in which questions of substance were few and far between, so they basically chat up the candidate and decide whether they like them. At the end of this "rigorous" process, if everyone agrees, the candidate is accepted.

What a joke! When you're hiring a musician, an audition (in which the musician performs) is standard practice. When you're hiring a writer, reading things previously written by the candidate is standard practice. So when you're hiring a writer of software programs, naturally you'd expect that reading programs previously written by the programmer would be standard practice — but it's not!

Leading Edge Interview Practices

Instead, the leading edge at places like Google is to hit the candidate with trick questions. For example: "Suppose you were suddenly shrunk to the size of a nickel and found yourself at the bottom of a blender. The blender is going to start in a minute. What would you do?"

If I were the size of a nickel, most of my neurons would be gone, so I wouldn't be me anymore. But more seriously, how relevant is the kind of skill questions like this test to writing programs?

I could make an argument that this kind of thinking is relevant to a kind of programming that is important, but very rarely needed: algorithmic design. No one has ever (to my knowledge) measured it, but I would be surprised if algorithmic programming amounted to as much as 1% of all the code in a typical application. The vast majority of code that's written needs different kinds of skills: visualizing user interactions, understanding data structures and data flows, understanding and effectively using complex subsystems, and many other activities. These activities benefit little from the kind of skills and instincts required for solving trick puzzles.

Are there people who can do the puzzles and be great "regular" programmers? Of course. The problem is the reverse: there are many people who would be perfectly adequate programmers who are flummoxed and generally disconcerted by questions of this kind. I've been in groups of trick question masters. They're great for finding what's complicated in basically simple things and other arcane but fundamentally counter-productive skills. Other than that — you can have 'em! Take 'em all — please!

What can be Done?

There are lots of simple steps that can be taken to improve the outcomes of sourcing and selecting software people. Any step you take is likely to make things better. I hate to do it, but I have to admit that even trick questions are better than the "Hi, how ya doin'" method of interviewing. But surely we can do better.

Reading code. When hiring writers, we read their past works. Why are we so reluctant to do so for people who write code? I suspect it's because very few people would actually be able to read the code and make a reasonable judgment of the author — for all too many typically mediocre programmers, that would amount to a tour de force way beyond their limits. That's OK — find out who can read code with meaning and judgment, and they become your main filtering agent. Maybe, just maybe, you'll end up with .. people who write better code! What a concept!

Auditions. What's wrong with an audition? If someone is supposed to know database design really well, show them one of your current ER diagrams and ask for comments. Tell them a change that is proposed and ask how they'd make it. Pose a recent tough problem you had to solve (which you've already solved) and ask them to solve it.

Detailed archeology. A candidate programmer may not know much about your stuff, but she'd darn well be an expert on stuff she's coded in the past. Find a subject where your experience overlaps hers, and ask for a detailed rendition on what she did, why and how — and what she learned and would do differently today.

Subject Matter Testing. Yes, testing. Like an audition, only more objective. If someone really is the expert php programmer they claim to be, they'll ace the test. No problem.

Conclusion

Software hiring is an embarassing mess. In a field that is over-the-top exacting with fairly objective pass/fail criteria (the program works or it crashes), the methods we use to ask new people to join us are random, ad hoc, and almost completely unrelated to finding out whether the candidate can actually perform as required. Asking trick questions can actually be an improvement, but that's not saying much. We can and should do better, and even a little better can make a huge improvement in the quality of the people who build our software.

December 24, 2011
Status in Software: Silliness and Stupidity
In all too many software groups, you get higher status by being more removed from actual customers, their needs and concerns. This is bass-ackwards. It's silly. It's perverse. It is profoundly stupid and counter-productive. If this is how your software group works, change it or leave. Now.

The Inward Flow: Support

In most organizations, here is the perverse flow:
- Customer has problem. Contacts Customer Service.
- L1 customer service takes the call or e-mail. Eventually. They try to do something, but don't have much knowledge or power. So after wasting some time, it's off to…
- L2 customer service, which is backed up failing to handle other things L1 already kicked up to them. After wasting some of their own time, and often some of the customer's as well, it's off to…
- L3 customer service, which is the place where the really experienced L2 agents are promoted. Life is messy in L3. All the nasty problems end up there, often with the customer already being (understandably) mad, but too frequently lacking the skills and resources to even reproduce the problem, much less fix it. After spending some time here, the worst problems of the most upset customers migrate to…
- Sustaining engineering. This is the death-watch group in engineering. Two types of characters are typically confined here: ignorant entry-level people who hope to move up and out; and experienced engineers who missed the cut for working on the new stuff. If it's an easy bug, they may be able to fix it. Otherwise…
- …it may actually be necessary to interrupt an exalted person who wrote the code that caused the problem, taking him away from the important business of writing code that has brand-new problems! But this drastic measure is avoided if at all possible.
There are actually more layers to march through, but the pattern should be pretty clear by now: the "most important" people are protected from the consequences of their past mistakes by layers and layers of carefully arranged bureaucracy designed to deflect and defuse any contact with real customers and the problems those customers may be having. The more you know, the more distant you are kept from having your august presence sullied by the trivial annoyances of mere customers. It doesn't need to be this way.

The Outward Flow: Development

When new products are created, it is all too often the case that the higher your status, the more removed you are from contact with the people who will ultimately use the product you create.

In very large organizations, the remote peak of the status hierarchy is occupied by research groups or labs. These are truly hilarious. Why do they have ultimate status? It is a given that they see no customers, hear no customers and talk with no customers; but even better. they produce nothing tangible at all — unless you count academic papers and research reports. Those people are sure important! Their ground-breaking work will (pick your favorite) "lay the foundation for," "create the basis of," or "make the discovery on which" generations of future products will be built. Sure.

Smaller organizations would love to have such a group — it's prestigious! — but instead make do with a few exalted individuals who think deep thoughts and create "architectures" that "solve" a wide range of present and future problems.

High level design people then take over to create a "design" within the "architecture." This is not easy! It's important to fend off the constant pressure to produce something practical that works for today's customers, in favor of doing the design "the right way," i.e., spending lots of time thinking about problems some customers may have in some unspecified future, and "creating a framework" that will supposedly make them easy to solve.

At this point, software development splits into an alphabet soup of competing creeds, each of them certain of their unique virtue and access to software heaven. There is the much-maligned waterfall, agile, SCRUM, extreme, and on and on. The details of what happens next vary. The status relationships and ultimate outcomes are pretty much the same: the more important you are, the less likely you are to have meaningful contact with customers. This remains true as the software staggers through phases that may various include integration, testing, staging, documentation and roll-out.

Finally — finally! — the software is inflicted on the customers for whose benefit it was built. All I can say is that the chorus of complaints, however loud it may be, is rarely loud enough to penetrate the excellent sound insulation of the rooms in which the company's "brain trust" festers.

Conclusion

If you want to run a charity organization for egotistical, self-absorbed and self-important programmers (OMG! Did I just use the demeaning term "programmers," implying these people might actually lower themselves to doing actual, like, work!? I meant to use a more elevated term like "intergalactic systems architect" or "chief scientist.") — like I was saying, if it's your goal to provide welfare to high-minded computer scientists, by all means employ a staff of "elite" techies and help them avoid being interrupted by the hoi polloi. Their deep pondering is way too valuable to be sullied in any way by the mundane concerns of the common people. If, on the other hand, you have real work to do and want your best people to lead, then make sure that the closer people are to customers the more status they have. Building a product or service that real people value and want to use requires — gasp — contact and interaction with those same real people.
October 31, 2011
Software: Move Quickly While Breaking Nothing
In places that want to succeed (why would you be anywhere else?), the pressure is on the software team. Change more stuff. Move more quickly. Where are the results?

Ok, great you say. I'm already working as hard as I can. The only other thing I can do is relax the process, the safeguards that assure good, high quality results. I'll give it a try.

Maybe you (and your company) get lucky for a bit. Then the inevitable happens. The release that just got pushed had bugs and/or unintended consequences. There are panicked calls, late nights, frayed nerves and angry managers. The managers are extra angry at the programmers who did what the managers said, not what they thought they implied (i.e., don't break anything). The software people fume about management that acts like the ancient regime, making insufferable, impossible demands of the oppressed workers. No one is happy.

"Push More Releases" is NOT the Answer (by itself)

Sometimes when I talk with programmers, I'll give some version of my "releases are stupid" talk or my "dates are evil" talk and some brave soul comes out with the classic "I tried that and got screwed" response, as summarized at the beginning of this post. And I always agree with the brave soul: if all you do is push more releases, you're not just inviting disaster, you're sending a chauffeured limo for disaster and welcoming it with a red carpet and cheering crowds.

The classic, project-management-centric methodology with its infrequent releases is broken. Broken!! So who in his right mind would think that keeping the methodology except for some essential, integral parts is a good idea?

Let me see if I've got this right: you don't like your methodology, so … you're going to cripple it and hope things will improve??!!

By the way, if you're sitting there with a smug expression thinking "we don't have that problem, we use agile," think again. Agile does not change the situation.

The Answer is a New Methodology, Optimized for Speed

The details of the new methodology are important; I've written about them at great length in my private-distribution papers. It's a seismic shift from the normal way of doing things. Like with any big change, it's useful to start with baby steps. So here's a start:

Grow the Baby

The mainstream process of software development resembles a car factory. There are lots of inputs. The inputs are assembled into sub-assemblies and then assemblies. There is a final integration phase, after which the car rolls off the assembly line. When it rolls off, people hope it works; there is no expectation that it will work prior to the end of assembly; until then, it's "work-in-process." This is typically the way things work whether you release once a month or once a week.

The speed-oriented method resembles growing a baby. There is one and only one overriding rule: Don't Kill the Baby! The baby is grown by tiny incremental additions; each addition takes awhile to get right, but none of them is fatal. It takes a while for it to learn to walk, for example, and spends some time walking poorly. But while this is going on, the baby doesn't lose its ability to crawl, eat, burp, roll over.

Don't Kill the Baby!

When the subject is babies, there is near-universal agreement that killing them is something to be avoided. But when software is developed with the usual methods, it's alive only some of the time — mostly it's dead! The cornerstones of the speed-oriented method are:
- Small, frequent changes. You should make progress towards your goal every day. Do the best you can and make the most progress you can every day.
- The new stuff doesn't need to work at the beginning.
- The old stuff can't be allowed to break. This is usually achieved by some kind of continuous integration and live parallel testing. How you do it isn't important. That you do it is. Your software must not break. Clear?
- Iterate. Don't lay out months' worth of work. Set overall goals, then spend some time looking at what you did yesterday to help decide what to do today.
Baby Yourself!

One of the Oak companies was having trouble meeting all their commitments. The pressure was on to deliver more, and deliver it quickly. But the company was also getting beat up because of serious flaws in recent releases. Talk about being caught between a rock and a hard place!

It took quite a bit of work, but they shifted to the kind of speed-oriented, always-alive method described here. They had some fun with it along the way. Part of the fun was the mascot they adopted, which I got to see during a recent visit.

Grow the baby!
June 27, 2011
Software Project Management: “Releases” are Stupid and Out-moded
Tell me, do you use Google by any chance? Yes? OK, now tell me which release of Google you use now, which ones you've used recently, and which one you like best; and, by the way, how was the upgrade process? Oh, you don't know? There's no way to find out, you say?

What the heck is WRONG with those people at Google? Don't they know that modern software development processes demand a disciplined release and planning methodology? There is NO WAY customers will put up with a product when the vendor just shoves new releases at them any time they feel like it, without even proper notification! And what customer is going to sign up with a vendor who won't commit to a future roadmap, with at least a year's worth of features laid out, tied to hard-commit release dates? That Google, they violate every rule of the game, there's no way they're going to make it as a software/service company!

What, how can it be! NOOOOoooooo…. Google is the world's most valuable software company!!?? When they don't even have the most basic element of proper software methodology, releases?? I must be sleeping! This must be a nightmare!!

Get over it!

Here are the facts:
- Classic project management has been a disaster for software development.
- All the heavy-weight process things that people do to make things better … invariably makes them worse!
- The classic big-bang release is the cornerstone of the temple of evil.
- The classic little-bang release (a.k.a. "agile") is a brick in the temple of evil.
- "Releases" are … stupid! … and not only that, they're … outmoded!
There is life after "releases." It is a better life. The software is better. The users are happier. Go there. Enjoy it.
February 28, 2011
What Do Consumers Want from a Web Site?

Why is this question important? Because whatever it is that consumers want in a web site had better be exactly what you want (and create) in that web site! Assuming, of course, that you care about success…

All of what I'm about to say is obvious, but you'd never know it from looking at web sites and from looking at the priorities of web developers, both of which I do quite a bit of in the course of my work. Everything I'm saying here applies equally to SaaS operations, BTW. So from most important on down:

Not Broken

I've written about this again and again. If a consumer tries to go to your site and it isn't there or it's otherwise obviously broken, the consumer's reaction is roughly the same as if they went to a new store and found it locked and shuttered.

Do you think they're likely to check again later when you've fixed the bug or gotten the right release level of the OS onto your servers? Is there anything more important than having your "store" ready and open for business when some arrives?

Not Slow

I'm always amazed at how little attention insiders pay to this crucial issue.

What ……………. if ………………. it's ……………. just ……………… a …………… little …………… bit …………….. slow. ………….. How …………… bad ……………….. could …………….. that ……………….. be? Well, ask yourself — how did you enjoy reading those last two sentences? Did it make you easger to read more of the same? What ……………………………………………. if ……………………………………. it ……………………………………………………….. were ………………………………………………. even ……………………………………. worse? Would you come back to any site where the whole doggone thing were that way, all the time?

Conclusion: even if you can't make it fast, make sure your site isn't slow.

Attractive

We're on a roll. The place is alive, open for business, and responsive. A great start! Now — at a glance — do I want to be here? Do you want to get with this? Or do you want to get with that?

People will tell at a glance whether they like it or not. If they don't like it, they're most likely gone.

Easy to get where you want to go

Far too many web sites are, simply put, navigation hell.

Not on purpose, of course. The worst examples seem to result from someone wanting to "organize" the site into some system that "makes sense." Sure. As though anyone cares about your page and folder hierarchy! The only thing that most users care about is getting to what they want with a minimum of mystery. Does having an "organized" web site help this? Generally speaking, it does not. The only thing that helps is … looking at things from the user's point of view. What an idea!Maybe if you do this you can avoid the web equivalent of:

Minimum clicks to do what you want

Every click, every keystroke that stands between a user and his goal is an opportunity for that user to decide that it's just not worth the trouble. "How much trouble can a couple clicks and keystrokes possibly be?" you may ask yourself, as you madly set things up so that more of them are required, for reasons that are no doubt compelling. The answer is simple: every keystroke or click is an opportunity for the user to bail out of the process before the goal you want them to reach has been attained. Is that clear enough? Any click that can possibly be eliminated is a click that should be eliminated. Period.

What Do YOU Want from a Web Site?

I hope it's exactly what consumers want. Everything else is details. If you screw up even one of the above, you're swimming upstream. If you get the order of the above wrong, you're missing the boat. If you fail to pay attention, I hope everything else in your life is really great, because the part involved with web sites won't be, and even more metaphors won't help you.

January 19, 2011
Communicating Technology: the SMiT example

I love my job.

I was in Beijing, China a couple weeks ago talking with an incredibly fast-growing internet ad network, YoYi Media. These guys are doing great things; they'll succeed whether we end up partnering with them or not. While on-site, I got a chance to talk with Liya Wang, who was helping us "on loan" for a couple days from her main responsibilities at an Oak company I've had nothing to do with until now: Shenzhen State Micro Technology Co, SMiT.

Liya is a finance person working on primarily financial issues with SMiT, and as such, she needs to explain to bankers and investors what SMIT does; understandably, people kind of want to know what you do prior to signing the check. As a start, she tried to get me to understand what they did. Fortunately, given my background, it wasn't too hard. But I could appreciate her trouble. The company essentially makes complicated chips that do complicated things that help make digital TV better. It is difficult to even start talking about what SMIT does without plunging into acronyms that are no more understandable when spelled out.

This is exactly the point at which … the CTO should jump in and solve the problem!

… Wait, wait, don't you mean the fluffy, slick marketing people? Don't you mean the branding people? How can you possibly mean the CTO, who is the leader of the nerds who create the communication problem in the first place?! I mean, the CTO is the head of the incomprehensibility crowd, the lead creator of obscure acronyms, the guy who says things like "if I tried to explain it, it would just scramble your brain."

The CTO you should have, the CTO you should aspire to be, is exactly the right person to explain the problem, because he is the one who is most likely to actually understand the technology that needs to be explained! It's that simple.

Liya and I took a first pass at coming up with a way of explaining what SMIT does that does not involve scrambling of brains or twisting of tongues. She personally used Powerpoint to take a first cut at putting it into slides. I thought she did a great job, so I'm going to embarass her by taking some of images from the builds of just one of her slides that explains the core idea quite nicely. (I'm sure it can be improved, but this is a quick turn-around first cut!)

Starting, of course, with the problem: here's the confused user who would like to watch cable or satellite TV, but instead of just a TV and a remote, has an extra box, wires, things to hook up and several remotes.

So what SMiT do about this messy problem? The cable/satellite box disappears and there is a little card.

Then you see the card moving so that it's inserted in the TV — in fact, it can be pre-inserted.

Once the card is in the TV, all the wires are gone, and you see the multiple remote controls turn into just one control that enables you to do everything:

The user of course is happy about this, and so gets a nice green check mark. But for the verbal learners in the crowd, it's worth spelling out the benefits:

This simple sequence of animations shows anyone what SMiT does and why it's good. Thanks, Liya!

Sequences like this are usually painfully obvious once they've been created — it seems like they've always existed. But having been around many times when there is no such simple way of conveying the essence of a complex technology, I know how hard it is to do, and how important it is to get it right: to be simple without being simplistic.

While this is not in the typical CTO job description, it is actually one of the most important things a CTO can do, and no one does it better than the best CTO's.

If you're a CTO or on the path to be one, I suggest that this is an incredibly important exercise to undertake. It actually helps you to get perspective about what you're doing (and why you're doing it!) to be "forced" to explain it to non-technical people. It gets you "outside" of yourself in a way that is often useful.

December 17, 2010
Software Project Management: Dates are Evil

The problem with estimates and dates in software is simple: nearly everyone wants them, and having target dates usually makes things worse. Worse! Are there exceptions? Yes – but not many!

I’ve written agonizingly loooooong papers on this subject. But this is just a blog, so here is a bite-size thought to chew on.

Software estimates of dates might be a good thing if the thing we were estimating were something we have done over and over again, and had reasonable standards to apply. But we don’t.

Think of things like your commute to work, and the commutes of other people to the same place. Say you tell a friend your commute this morning took half an hour. Your friend will immediately be able to react to that information by saying, “whoa, was there an accident?” or “how come they didn’t catch you, lead-foot?” Your friend has a clear idea of the range of reasonable times for a “normal” commute, and can use that knowledge to judge on his own whether the one this morning was fast or slow, and by a lot or a little.

Now suppose a friend tells you that he's just finished designing a database, and that it took a week. Lots of people have designed databases. Say you’ve designed lots on your own. How do you respond? Is your friend a superman, for designing the database in record time, or a slug? The fact is, without knowing a whole lot about the details of the database to be designed, you have no idea. Even if your friend tells you some basics like the number of tables and columns, you still really have no idea. Was there a starting model? Are there special performance, size, interface or compatibility requirements? Were there a bunch of conflicting interests to account for? Depending on lots of details like these, even knowing the “raw” size of the database schema that was designed, you still have no way of judging.

Lots of interesting things come from this observation. One of them was taken up and exploited by the big services firms long ago. Here it is: the people who look at your project plan typically don’t have a clue – not even a foggy idea – of whether the times and resources assigned are reasonable. What the manager typically wants is confidence that, whatever resource and time commitments may be made, they are kept. So project management, in this context, is all about having lots and lots of highly technical looking stuff with charts and diagrams and fancy reports to justify what are – all too often — completely outrageous resource and time commitments. No one looking at them has any way of knowing that they’re outrageous. So they get approved. The expectations are eventually met. And everyone’s a hero!

Speed vs. Predictability on the Race Track

Imagine we’re on a race track and there are some people running the 100 meter. Most of the people are outstanding runners, so they’re always competing with each other. In every race, all but one of them is a loser, and they tend to set aggressive goals of the times they’re going to make, which they frequently miss by a little.

The typical project manager would look at them as a bunch of unpredictable losers! They usually miss their goals, after all, and they practically always lose the contest they enter!

Now someone sidles up to the disappointed project man and says that he’s a different breed. “When I say I’m going to do something, I do it” is the line. So our runner says, see that 100 meter track over there? I’m going to go up to it within the next 15 minutes, and I’m going to tell you when I’m starting. You time me. I’m going to cross the finish line with 60 seconds. And I’m going to try to exceed your expectations.

Naturally, the project man is impressed. This is what he’s been looking for, a winner who does what he says he’s going to do.

Ten minutes later, the guy goes to the starting line, announces his start, and 55 seconds or so later, crosses the finish line, ahead of schedule! Wow! He started ahead of time and took even less time than he told me – this guy is a star!

Speed or Predictability: You Choose

What this boils down to is that you’ve got a choice. Which do you want: speed or predictability?

If you want predictability, dates are your friend. By all means, set a 60 second target for the hundred meter dash, and consistently beat your estimate. You’ll look like a hero to the ignorant, and you’ll do your part in flushing your business down the toilet.

If on the other hand you want speed, you will avoid setting date targets like the evil, mutilating plague they are. You will clearly define the target a hundred meters from here, and encourage your swift runners to start before the words get out of your mouth, and arrive yesterday. You may cause a bit of confusion among business people to whom this way of working seems strange and new, but they’ll quickly adapt as they see high spirits, pride of accomplishment … and results, the like of which they’ve never seen before.

October 1, 2010
Success with New Technology and Mouse Madness
You’ve
got a wonderful new technology. It works. Customers benefit from it. Game over,
right?

Wrong.

Sadly
(for you), the world does not revolve around you. The world is not on constant
alert waiting for better mousetraps to appear somewhere so that the world can
beat a path to your door.

Let’s
take the B-to-B case. If you’re a big technology buyer, you’ve got better
things to do than constantly flirt with new vendors. To the contrary, you want
to find ways to cut the number of vendors you work with. If your
existing vendors aren’t screwing up, if their products are good enough and
their price is good enough, chances are you’ll save time and stress and feed
them more orders as you grow. You may take meetings from wanna-be’s, mostly
for your general education; it’s a waste of the wanna-be’s time, but that’s not
your problem.

The
reality is that most big technology buyers have a barn of designated winners,
all of whom are “approved vendors.” The vendors
have won the design competition, and are now baked in to the buyers’
expansion plans.

The
only thing that is likely to change this situation is a combination of buyer
pain and incumbent supplier inadequacy. While some technology buyers are
motivated by opportunity, most consider a vendor change only when they feel
pain. The pain is nearly always driven by a need to reduce costs. In order to
reduce costs, the vendor needs to do X, and if it can do X, it needs to be able
to do it for a particular price.

Since
this sounds abstract, let me illustrate it with a real example from Xiotech, my
favorite storage vendor.

Xiotech
has invented a new mousetrap, the storage blade, which delivers
storage in better ways than existing storage products, sometimes dramatically
better. Is it really, objectively better? Yes. Can buyers get more done and
spend less money? Yes. Does that matter to most buyers? Do storage buyers beat
a path to Xiotech’s door? By now, the VP of Sales at Xiotech has accepted the
fact that they do not. Having hoped for the easy life of taking orders, he is
resigned to having to go out and sell stuff. The world feels cruel and unjust,
but that’s how it is.

Xiotech
has a better storage mousetrap, and the world has reacted the way it always
reacts to better mousetraps. So what does Xiotech do?

It’s
pretty simple, actually. If you had a mousetrap that was really and truly
better than the other guys’, wouldn’t it make sense for you to find places that were totally, horribly overrun
by mice? Not just places that have mice – places where the mice are in
charge; places where the mice start trying to invent “peopletraps” because they
think they’re infested. Places, in other words, where the inadequacies
of the best existing mousetrap technology have been put to the test and come up
short – miles short.

No
one expects the incumbent mousetrap vendors to walk away from the business that
has served them so well for so many years. They are bold and shameless, and
will come up with all sorts of arguments. They will argue that if you have a
mouse problem, you obviously haven’t bought nearly enough of their wonderful
traps; the solution is to buy more! If that doesn’t work, they will wave their
arms furiously about the new trap that is about to come out, avoiding any
mention of the increase in maintenance charges for existing traps. Meanwhile,
you walk around and see the mice dancing on the existing vendor’s obviously
ineffective traps.

If
you’ve really got a better mousetrap and are looking for motivated buyers, the
place with the super-sized, invasion-from-Mars MOUSE problem is your candidate.
Most people buy the same mousetraps they’ve always bought. They may be lousy
traps, but they just don’t care. It doesn’t matter enough. But the guy with the
MOUSE problem, HE CARES. He pays attention to mousetrap technology, because he
knows he’s got a whole pile of mice that need catching REAL BAD.

The
cruel fact of life for inventors of better mousetraps is that, if you can’t
find anybody with SERIOUS mouse invasions, you are hosed. Your superior trap
will do nothing but waste the time and money of everyone involved. But even if
you can find places where the mice are dancing in the halls with impunity, your
mousetrap had better be SERIOUSLY better than the incumbent. If the incumbent
knocks off a mouse or two but leaves dozens aspiring for a spot on Dancing with
the Stars, while yours knocks off two or three or four but still leaves most of
the mice dreaming about skimpy costumes and getting “ten’s” from the judges,
you may as well hang it up now. To make a long story short, you had better:
- Find
  a truly worthy mouse problem.
- Solve
  it. Really solve it. No kidding, nail it!
- Solve
  it for a reasonable price. If you’re too expensive, you may sell to a few
  desperate buyers, but you’ll never become the industry-standard mousetrap.
Back
to Xiotech storage. Xiotech is focusing on buyers who have the kind of problems
that Xiotech storage, with its high performance and linearly scalability at a
reasonable price, is uniquely positioned to solve. Just as mousetrap guys look
for places with too many mice, Xiotech looks for places that can’t get at their
storage. Just as the incumbent mousetrap guys boldly say “just buy more of my [crummy]
mousetraps,” the incumbent storage vendors say “just buy more of my [slow]
storage [that gets even slower as you fill it up].” In spite of such incumbent
resistance, smart buyers with mouse madness buy better mousetraps, and smart
buyers with storage bottlenecks buy better storage.

Success
with new technology is usually only achieved when:
- there
  are pockets of buyers who have SERIOUS pain;
- the
  pain is worth serious money;
- you
  can find the people with the pain;
- you
  can address their pain;
- your
  technology can really make the pain go away;
- ideally
  without charging more money.
Failing
this, I guess you could try waiting and hoping that the world will beat
a path to your door…
May 6, 2010
Is Your Site Working? Do You Really Care?
Like He-Who-Must-Not-Be-Named in the Harry Potter books, I find that there is something that must-not-be-named in the world of computing. It is That-Which-Must-Not-Be-Discussed. It’s not so much that people fear it (although many of them do), it is more that it is (for reasons that mystify me) incredibly low prestige. It’s kind of like maintenance or janitorial services in an office building. The higher prestige you are, the less likely you are to mention the subject – it is simply beneath notice. Why, people who wear uniforms, for goodness sake, do the work.

Nonetheless, this subject is incredibly high value. In fact, it’s hard to argue there is anything more valuable in the computing world. What is the subject? Well, think Haiti. Think Chile. Think Mexico. Think earthquake. Yes, earth-shaking, building-crashing, crevice-opening earthquake. What is the equivalent of an earthquake in the world of computing? You know, yes you do. If it’s a series of scary tremors, then it’s a site slow-down. If it brings down buildings, then it’s a site crash. If it brings down buildings and cars and people disappear into newly appeared holes in the ground, it’s a major outage with data loss.

Where does making buildings earthquake-proof stand in the overall priority of things?

It couldn’t have been too high on the priority list at RIM in the time leading up to the earthquake that struck them last December. Here is a representative story:

http://www.nytimes.com/aponline/2009/12/23/business/AP-US-TEC-Research-in-Motion-BlackBerry-Outage.html

What a horrible thing to happen to their business! Lots of free publicity of exactly the kind they don’t want.

The story reminded me of similar events, less public, that have taken place at a couple of companies I know very well. The story also led me to reflect on how data centers (and related development) issues are typically left to fester until they blow up. Then the alarms ring, everyone runs around, the immediate problem is fixed. What is unusual is for management to take the systemic action that is required to greatly reduce the chance of the failure recurring. Some data center operations resemble a coal-fired heating furnace – they require constant care and feeding, are cranky and don’t like change of any kind, but people just don’t want to think about it. “Upgrade to gas? I don’t have time to think about it. Maybe in next year’s budget.”

Here’s what I find: the more august the group of people, the higher their status, the less willing they often seem to be to devote real time, effort and brain cycles to That-Which-Must-Not-Be-Discussed. This is wrong. It is so wrong, it is perverse. Change your priorities! Take a look at it now (or at least soon), when (I hope) the alarm bells are not ringing.

Even though the articles about the RIM debacle don’t go into detail, it is reasonable to guess a couple things about the RIM operation from facts that have been revealed. Here are some of the warning signs, most of which applied to the RIM case, and some of which may or may not have. I list them here as a quick check-list to see how vulnerable your operation may be.
- Highly complex system. RIM’s data center was said by several people to be highly complex. This is almost always a bad sign. Systems naturally become complex over time (kind of like entropy), and sometimes smart people insist for plausible-at-the-time reasons on adding complexity. The trouble is that, for a variety of understandable reasons, the more complex a system is, the more likely it is to fail when changed. It is worth working hard to reduce the number of elements in your data center and generally make it simpler.
- Change management risk. The failure at RIM is said to have been a result of a software “upgrade.” This is one of the most common opportunities for embarrassment. All too often, people respond by reducing the frequency of change, which actually increases the chance that any one change will cause a disaster (because it is likely to be a larger, more complex change). There are methods of reducing this risk to near zero.
- “us vs. them.” Most data center disasters I have seen happen when there is a (typically well-intentioned) strict separation between data center operations and the rest of the world.
At minimum, it is worth a quick, objective look at the “machine room” of your operation to see if it looks and feels like the kind of place to which disasters are naturally attracted.

Above all, get over That-Which-Must-Not-Be-Discussed – yes (dare I say), be like Harry Potter – call Voldemort by his proper name! Talk about earthquake vulnerability! And above all, do what you have to do so that, when things go wrong, your service keeps working.
April 6, 2010
The Chief Architect’s Role in A Tech Company
The
overwhelmingly most important job of any tech company’s top architect is …
[drum roll, please]… assuring the reliability and responsiveness of his
company’s product and/or service.

This
may not be what you think. It often is not what the top architect thinks. If
that is the case, all I can say is, think again, and think better this time.

I
find that the technical staff in computer-enabled companies have all sorts of
ideas of what things are “important.” Generally speaking, “importance” seems to
be correlated with distance from day-to-day events, distance from the data
center, distance from anything concerning “operations,” and distance from the
concerns that existing customers have working with the product/service the
company provides in day-to-day use. Anything “strategic” – that’s in; that’s a
proper concern of the top thinkers in the company. Anything “tactical” or that
could conceivably be in the realm of the customer service or data center
operations group – that’s out; that’s just a waste of time for the company’s
best minds.

However
common this way of thinking may be, I still find it to be not just bizarre, but
perverse. The reason, simply put, is that it’s completely out of sync with what
customers think the most important issues are.

If
your product or service:
- just
  doesn’t work
- goes
  off-line unpredictably
- slows
  way down at key times of day
- old,
  reliable features suddenly stop working, or change their behavior unpleasantly
What
do you think will happen to your customer base? If you think you don’t care
because you have such a great flow of new customers, think for a moment about
what causes that flow: do you think reputation, references or word-of-mouth
might have anything to do with it? Do I have to mention Toyota to remind you
how fast a great service record can get destroyed?

Now,
let’s get to the crux of the issue: when those bad things happen, whose fault
is it, and whose actions and decisions are most highly correlated with creating
the conditions that led to the problem? To make this simple, let’s turn again
to Toyota.
- Is
  the driver (user) at fault due to improper use? Sure, that makes sense, it was
  the users that made the site crash.
- Are
  the technicians (for example in the data center) at fault? Sure, they can screw
  up; but the best-architected systems don’t depend on technician action to
  achieve reliability and response time.
- Are
  the customer service people at fault? Hmmm.
Here’s
the reality: while anyone in an organization can screw up and cause problems
for customers, the most serious issues I’ve seen in companies are the direct
result of architectural decisions or lack of attention/involvement. This
includes response time, flakiness, down time, and the other things that drive
customers nuts, not to mention drive them to your competitors.

I’ll
give a couple of illustrations.

A
company’s web site was down for an hour or more at a time. Repeatedly. The only
solution is to re-build a key component and its database on a new machine and
bring it on-line. Right away, the finger of guilt pointed at data center
operations for failing to deliver. But the root cause of the problem was a key
application component that had no fail-over capability. How could that happen?
Simple. The company’s top architects failed to make scalability and fault
tolerance the non-negotiable, number one priority when selecting this key
component. Instead they concentrated on all sorts of other things. Are the data
center people at fault here? Hardly. They just had this crippled software
tossed at them, and did their best to hold off the inevitable disaster.

A
company was in a vice grip of pain. Existing customers were complaining that
the service provided was slow and faulty. New partners were putting on the
squeeze for new releases of functionality that they felt were crucial for
winning business. The more new code the company released, the more bugs and
customer problems were created. When the company tried to slow things down to
stabilize the service, the angrier the new partners got, who accused the
company of failing to meet its commitment to them. The data center staff
exploded, the QA staff grew, consultants were crawling around, and life was
miserable. The root cause? Again, simple. The company’s top architects had
completely ignored the whole release and go-live process, and built a software
system that was designed for a set of unchanging requirements, instead of the
fluid and constantly changing reality of the company’s customers and market.
The whole nightmare was an architectural side-effect, and the solution was a change
in architecture – the good, practical kind of architecture that encompasses
everything about the company’s product/service, including releases and the data
center.

I
think the message is a clear and simple one: if your top minds are not already
focused on the company’s most important issues, viz., those that are most important
to your customers, get them focused on those seemingly mundane, tactical,
near-term, nuts-and-bolts concerns. Now.
April 2, 2010
Kayak and Customer Intimacy

Kayak is a case study in how to get engineers to interact with customers to move your product or service ahead rapidly.

Paul English, their excellent CTO, put up a recent blog post on the subject that humorously features a red phone.

I continue to be amazed at how frequently organizations try to protect those "precious, critical path resources," their engineers.

The engineers are often complicit in this. They do all this project planning and get signed up for an ambitious string of deliverables. Then things happen — new requirements come along, some things are harder and take longer, they are late and feeling the pressure. How do think they react when some well-intentioned, idealist jerk comes along and talks about what a great idea it is to have engineers handling customer requests and complaints directly? R-rated, to say the least.

It's the whole way organizations are set up that leads to everyone thinking that the "red phone" is a terrible idea. And that's a good test for your organization — if the "red phone" seems like a terrible idea, then maybe your organization needs to change.

February 4, 2010