• Cryptocurrency: Money, Trust and Regulation Book

    A book has been published about cryptocurrency that stands out from the many books available on the market: it's written by a person with experience and true expertise in financial markets, institutions and regulation both in government and the private sector, Oonagh McDonald. Disclosure: I was her technical advisor for the book. We connected as a result of my article in Forbes on Central Bank Digital Currencies.

    61ACT87EQ3S._SX331_BO1 204 203 200_

    Dr. McDonald's prior books are impressive because of her amazing perspective and knowledge. Here's her background:

    Dr. Oonagh McDonald CBE is an international expert in financial regulation, having advised regulatory authorities in a wide range of countries, including Indonesia, Sri Lanka and Ukraine. She was formerly a British Member of Parliament, then a board member of the Financial Services Authority, the Investors Compensation Scheme, the General Insurance Standards Council, the Board for Actuarial Standards and the Gibraltar Financial Services Commission. She was also a director of Scottish Provident and the international board of Skandia Insurance Company and the British Portfolio Trust. She is currently Senior Adviser to Crito Capital LLC. She was awarded a CBE in 1998 for services to financial regulation and business. Her books include Fannie Mae and Freddie Mac: Turning the American Dream into a Nightmare (2013), Lehman Brothers: A Crisis of Value (2015) and Holding Bankers to Account (2019). She now lives in Washington DC, having been granted permanent residence on the grounds of "exceptional ability".

    Read the comments at the link about her books on Lehman Brothers, Fannie Mae, bankers and markets and others.

    Here are examples of what others have said:

    Oonagh McDonald has done it again. In this ambitious book, she helps the rest of the world catch up with her on the opportunities and risks associated with stable coins. Even if one may disagree with her about the future of stable coins (and I do a bit), this book is an invaluable resource, especially as a teaching tool, because of McDonald’s ability to synthesize and interpret a vast amount of information about complex and novel practices. — Charles Calomiris, Henry Kaufman Professor of Financial Institutions, Columbia Business School

    McDonald’s rigorously researched analysis of the development of cryptocurrencies is a must-read for anyone who has a stake in the future of money. It is an historical tour de force that painstakingly teases out of every corner of the cryptocurrency world the critical issues that governments, policy makers, and consumers must consider before abandoning government fiat money. — Thomas P. Vartanian, executive director and professor of law, Program on Financial Regulation and Technology, George Mason University

    Everyone fascinated by how the cryptocurrency phenomenon has created a whole sector of ventures to furnish ‘alternative currencies’, while the dollar price of a Bitcoin boomed from 8 cents to a high of more than $60,000, must wonder whether all this will really bring about a revolution in the nature of money. Will Bitcoin’s libertarian dream to displace central bank fiat currency be achieved? Or ironically, will central banks take over digital currencies and make themselves even more dominant monetary monopolies than before? Oonagh McDonald, always a voice of financial reason, provides a thorough consideration of these questions and of cryptocurrency ideas and reality in general, with the intertwined issues of technology, regulation, trust, and government monetary power. This is a very insightful and instructive guide for the intrigued. — Alex J. Pollock, Distinguished Senior Fellow Emeritus, R Street Institute, and former Principal Deputy Director, Office of Financial Research, US Treasury

    It's a different perspective from the many books on the subject of cryptocurrencies that have been published. Whether or not you agree with her conclusions, you will read facts and perspective here that are not available elsewhere.

  • The Nightmare of Covid Test Scheduling

    Oh, you want to get a Covid test, do you? Little did you know that the clever people who do these things also give you endurance, patience and intelligence tests at the same time! Our wonderful healthcare people and helpful governments have somehow arranged a diverse number of ways to make you fill out varying forms in varying orders, only to find out that there are no available appointments.

    Don’t you think the highly paid experts who created these services could have done something simple, like following the model of dimensional search used at little places like Amazon, travel sites and other places that care about customers? I guess that would have been too easy or something. And besides, medical scheduling in general is a nightmare, why should this be different?

    Looking for a test: CVS

    Search told me that my local CVS has testing. I clicked to the website of my local store. I clicked “schedule a test.” Although I had come from the local store, I guess the people who built covid testing didn’t manage to get the local site to pass on its location, so I entered my location again as requested.

    Now I have to “answer a few questions” for no-cost testing. Eight questions. Then when I say yes to recent symptoms, 12 more questions plus the date my symptoms began. Then clicking that I was truthful.

    Next, pick a test type, look at a map of local stores and see a list of dates starting today. I pick today. There’s a list of each store, with a button under each to “Check for available times.” Click on the first store. Here’s what appears:

    There are no available times at this location for Tue., 12/21. Try searching for availability on another date.

    Wow. I go up and pick the next day. Click. No times. Pick the next day. Click. No times.

    CVS has pioneered a whole new way to help customers pick a time! You pick a date, pick a store, click and hope you get lucky. Then pick a different store and/or a different time and click again. And keep rolling until you hit the jackpot! Assuming there’s one there…

    Since there was no end in sight, I tried something different.

    Looking for a test: Walgreens

    No questions first. Hooray! Just put in a location, pick a test type and see a list of locations. … Almost all of which had “No appointments available.” Let’s check out the one nearest to me, which said “Few appointments available.” I click. First I have to agree to lots of things. Now I have to enter my full patient information: name, gender, DOB, race, ethnicity, full address, phone and email. Then review and click that it’s correct.

    Then, it’s the covid questions: my symptoms, contacts, medical conditions, pregnancy. Have I had vaccines? For each, which one and the date given. Have I tested positive in the past?

    Now, after all that, I can pick an appointment. Back to that bait-and-switch first screen with test types and locations. I pick the location. Now a calendar shows up. Today’s date is highlighted. This message in red is below: “No time slots available within the selected date. Try a different date for more options.” The next 7 days are in normal type, beyond that they’re greyed out. Do any of them work? I try each day individually. They each give the same message! Why couldn’t you have told me that NO DATES WERE AVAILABLE!?!? Maybe even … BEFORE I filled all that stuff out??

    Looking for a test: The state of NJ

    Since I live in NJ, I get regular dispatches about how the state government cares about my health in general and covid in particular. So I went to the state site.

    NJ covid test

    Which it turns out is operated by a private company, Castlight.

    Castlight

    I put in my zip code. They list places that offer testing, one of which is the Walgreens I just tried. But I click on it anyway, and they link me to Walgreens testing … in a town 10 miles away instead of my town, which was explicitly the one I clicked on. Good job!

    They got my hopes up by listing Quest Diagnostics, which has a location in my town. I answer a long list of questions and am told that I quality for a test! Hooray! But then …

    Myquest

    I have to sign up and provide loads of personal information before even knowing I can get a test. That’s it for Quest.

    Looking for a test: The local county

    Maybe my local county would have done it better? Let’s check it out.

    I get a long list of testing places. How do I find one near me? After a few minutes of confusion, I discover that the sites are listed alphabetically! Now that’s helpful!

    CVS of course is near the top, with a line per location. My town isn’t listed even though I already know that the local CVS claims they do tests. Crap.

    Looking for a test: Digging deep

    I found a private place, Solv, that claims to link you right to testing places. I tried. They had a clinic not too far from me. Clicked. I’m still on Solv, which is potentially good. After more clicking It turns out that no appointments were available today or tomorrow, the only choices. Gee, Solv, maybe in the next release of your software you could possibly only show choices that were actually, you know, available??

    I finally tried a little pharmacy that is local and has remained independent. They offer tests. I clicked and got to a page dedicated to the pharmacy under a place I’d never heard of, Resil Health. Right away they list dates and times available. Just a few days out.

    Gerards

    I pick a date and enter the usual information on a clean form, but also my insurance information and a photo of the front & back of my card. Click. The time is no longer available! But at least picking another time was easy. I was disappointed that it was a couple days out. They sent an email with a calendar invite. I accepted. There was a link to reschedule. I tried it. To make a long story short, sometimes when I clicked reschedule the dates available changed, and earlier ones appeared. After some effort I snagged one the same day! Then I went. All I had to do was show my driver’s license – since they had everything else, neither I nor anyone at the pharmacy had to do paperwork – Resil health did it all, including the reporting.

    It was a pain, but by far the best. Hooray for small-group entrepreneurs, getting a service up and running that makes things easier and better than any of the giant private companies and certainly any of the pathetic ever-so-helpful governments.

    Looking for a test: Is it just me?

    I had to wonder: is New Jersey particularly bad, as snotty New Yorkers like to joke about, or is it just the way things are? It turns out that, even in high-rise Manhattan, covid testing is tough. This article spells out the issues.

    Mayor Bill de Blasio keeps telling New Yorkers frustrated with long waits and delayed results at privately-run COVID testing sites to use the city’s public options — but his administration’s incomplete and bulky websites make that exceedingly difficult.

    It’s not just me.

    Conclusion

    I got my test. I’ll get the results soon. Let's hope getting those results is better than it often is in medicine. What’s the big deal? I’m only writing about it because it’s a representative story in the life-in-the-slow-lane of typical software development. It’s possible to write good software. Thankfully there are small groups of motivated programmers who ignore the mountain of Expert-sanctioned regulations, standards and processes that are supposed to produce good software. These software ninja’s have a different set of methods – ones that actually work! For example, in New York City:

    The complaints echo the problems New Yorkers encountered when city officials first rolled out their vaccine appointment registration systems this spring — prompting one big-hearted New Yorker with computer skills to create TurboVax to workaround the mess.

    “We don’t have a single source of truth for all testing sites in NYC,” tweeted the programmer, Huge Ma, who was endearingly dubbed ‘Vax Daddy’ by grateful Gothamites. “Tech can’t solve all problems but it shouldn’t itself be a problem on its own.”

    One guy – but a guy who actually knows how to produce effective, working software in less time than the usual software bureaucracy would take to produce a first draft requirements document. This is one of the on-going stream of anomalies that demonstrate that a paradigm shift in software is long overdue.

  • The Dimension of Automation Depth in Information Access

    I have described the concept of automation depth, which goes through natural stages starting with the computer playing a completely supportive role to the person (the recorder stage) and ending with the robot stage in which the person plays a secondary role. I have illustrated these stages with a couple examples that illustrate the surprising pain and trouble of going from one stage to the next.

    Unlike the progression of software applications from custom through parameterized to workbench, customers tend to resist moving to the next stage of automation for various reasons including the fear of loss of control and power.

    Automation depth in Information Access

    Each of the patterns of software evolution I've described are general in nature. I’ve tried to give examples to show how the principles are applied. In this section, I’ll show how the entire pattern played out in “information access,” which is the set of facilities for enabling people to find and use computer-based information for business decision making.

    Built-in Reporting

    “Recorder” is the first stage of the automation depth pattern of software evolution. In the case of information access, early programs were written to record the basic transactions that took place; as part of the recording operation, reports were typically produced, summarizing the operations just performed. For example, all the checks written and deposits made at a bank would be recorded during the day; then, at night, all the daily activity would be posted to the accounts. The posting program would perform all the updates and create reports. The reports would include the changes made, the new status of all the accounts, and whatever else was needed to run the bank.

    At this initial stage, the program that does the recording also does the reporting. Reporting is usually thought to be an integral part of the recording process – you do it, and then report on what you did. Why would you have one program doing things, and a whole separate program figuring out and reporting on what the first program did? It makes no sense.

    What if you need reports for different purposes? You enhance the core program and the associated reports. What if lots of people want the reports? You build (in the early days) or acquire (as the market matured) a report distribution system, to file the reports and provide them to authorized people as required.

    Efficiency was a key consideration. The core transaction processing was “touching” the transactions and the master files; while it was doing this, it could be updating counters and adding to reports as it went along, so that you wouldn’t have to re-process the same data multiple times.

    Report Writers

    The “power tool” stage of automation depth had two major sub-stages. The first of these was the separation of reporting from transaction processing. Information access was now a key goal in itself, and was so important and done so frequently that specialized tools were built to make it easy, which is always the sign that you’re into the “power tool” phase.

    This first generation of power tools were specialized software packages generally called “report writers.” The power tool was directed at the programmer who had to create the report. Originally, the language that was used for transaction processing was also used for generating the report. The most frequent such language was COBOL. The fact that COBOL was cumbersome for this purpose was reflected in the fact that specialized syntax was added to COBOL to ease the task of writing reports. But various clever people saw that by creating a whole new language and software environment, the process of writing reports could be tremendously enhanced and simplified. These people began to think in terms of reporting itself, so naturally they broke the problem into natural pieces: accessing the data you want to report on, processing it (select, sort, sum, etc.), and formatting it for output.

    The result of this thinking was a whole industry that itself evolved over time, and played out in multiple environments and took multiple forms. The common denominator was that they were all software tools to enable programmers to produce reports more quickly and effectively than before, and were complete separate from the recorder or transaction processing function.

    At the same time, data storage was evolving. The database management system emerged through several generations. This is not the place for that story, which is tangential to the automation depth of information access. What is relevant is that, as the industry generally recognized that information access had moved to the report writer stage of automation, effort was made to create a clean interface between data and the programs that accessed the data for various purposes.

    Data Warehouse and OLAP

    Report writers were (and are) important power tools – but they’re basically directed at programmers. But programmers are not the ultimate audience for most reports; most reports are for people charged with comprehending the business implications of what is on the report and taking appropriate action in response. And the business users proved to be perennially dissatisfied with the reports they were getting. There was too much information (making it hard to find the important things), not enough information, information organized in confusing ways (so that users would need to walk through multiple reports side-by-side), or information presented in boring ways that made it difficult to grasp the significance of what was on the page. And anytime you wanted something different, it was a big magilla – you’d have to get resources authorized, a programmer assigned, suffer through the work eventually getting done, and by then you’d have twice as many new things that needed getting done.

    As a result of these problems, a second wave of power tools emerged, directed at this business user. These eventually were called OLAP tools. The business user (with varying levels of help from those annoying programmers) had his own power tool, giving him direct access to the information. Instead of static reports, you could click on something and find out more about it – right away! But with business users clicking, the underlying data management systems were getting killed, so before long the business users got their own copy of the data, a data warehouse system.

    In a sign of things to come, the business users noticed that sometimes, they were just scanning the reports for items of significance, and that it wasn’t hard to spell out exactly what they cared about. So OLAP tools were enhanced to find and highlight items of special significance, for example sales regions where the latest sales trends were lower than projections by a certain margin. This evolved into a whole system of alerts.

    Predictive Analytics

    OLAP tools are certainly power tools, but the trouble with power tools is that you need power users – people who know the business, can learn to use a versatile tool like OLAP effectively, and can generate actions from the information that help the business. So information access advanced to the final stage in our general pattern, the “robot” stage, in which human decision making is replaced by an automated system. In information access, that stage is often called “predictive analytics,” which is a kind of math modeling.

    As areas of business management are better understood, it usually turns out that predictive analytics can do a better, quicker job of analyzing the data, finding the patterns, and generating the actionable decisions than a person ever could. A good example is home mortgage lending, where the vast majority of the decisions today are made using predictive analytics. Many years ago, a person who wanted a home mortgage would make an appointment with a loan officer at a local savings bank and request the loan. The officer would look at your information and make a human judgment about your loan worthiness.

    That “power user” system has long since been supplanted by the “robot” system of predictive analytics, where all the known data about any potential borrower is constantly tracked, and credit decisions about that person are made on the basis of the math whenever needed. No human judgment is involved, and in fact would only make the system worse.

    Predictive analytics is the same in terms of information utilization as the prior stages, but the emphasis on presenting a powerful, flexible user interface to enable a power user to drive his way to information discovery is replaced by math models that are constantly tuned and updated by the new information that becomes available.

    Sometimes the predictive analytics stage is held back because of a lack of vision or initiative on the part of the relevant industry leaders. However, a pre-condition for this approach really working is the availability of all the relevant data in suitable format. For example, while we tend to focus on the math for the automated mortgage loan processing, the math only works because it has access to a nationwide database containing everyone’s financial transactions over a period of many years. A power user with lots of experience, data and human judgment will beat any form of math with inadequate data; however, good math fueled with a comprehensive, relevant data set will beat the best human any time.

    Conclusion

    All these stages of automation co-exist today. One of the key rules of computing is that old programs rarely die; they just get layered on top of, given new names, and gradually fade into obscurity. There are still posting programs written in assembler language that have built-in reporting. In spite of years of market hype from the OLAP folks, report writing hasn’t gone away; in fact, some older report writers have interesting new interactive capabilities; OLAP and data warehouses are things that some organizations aspire to, while others couldn’t live without them; finally, there are important and growing pockets of business where the decisions are made by predictive analytics, and to produce pretty reports for decision-making purposes (as opposed to bragging about how well the predictive analytics are doing) would be malpractice.

    Even though all these stages of automation co-exist in society as a whole, they rarely co-exist in a functional segment of business. Each stage of automation is much more powerful than the prior stage, and it provides tangible, overwhelming advantages to the groups that use it. Therefore, once a business function has advanced to use a new stage of information access automation, there is a “tipping point,” and it tends to become the new standard for doing things among organizations performing that function.

  • Trusting Science: the Whole Milk Disaster

    I trust science. The gradual emergence of science has led to a revolution in human existence that has happened so quickly and with such impact that it is hard to gain perspective on it.

    Trusting science is not the same as trusting the pronouncements of people who are designated scientific experts. Establishing the truth of a scientific theory is an entirely different process than the social dynamics of rising to a position of leadership in a group of any kind. Official experts, whether government, corporate or academic, nearly always defend the current version of received truth against challenge of all kinds; most of those challenges are stupidity and ignorance. My go-to expert on this subject, as so many others, is Dilbert:

    Dilbert expert

    Sadly, those same establishment experts tend to be the strongest opponents of genuine innovation and scientific advances of all kinds. As I explain here, with examples from Feynman and the history of flight, one of the core elements of successful innovation is ignoring the official experts.

    My skepticism is well proven in the case of so-called Computer Science, which doesn't even rise to the level of "useful computer practices" much less science. As I have shown extensively, Computer Science and Engineering is largely a collection of elaborate faith-based assertions without empirical foundation. And computers are all numbers and math! If it's so pathetic in such an objective field, imagine how bad it can get when complex biological systems are involved.

    This brings us to the subject of saturated fat (solid fat of the kind that's in meat), whole milk and human nutrition. This ongoing scandal — for which no one has been imprisoned, sued or even demoted — in spite of its leading to widespread obesity and other health-damaging conditions — is still rolling along. The hard science concerning the supposed connection between saturated fat, cholesterol and heart disease is in. The results are clear. It is positively healthy for people to eat saturated fat. Period. The scandal is that the "expert" people and organizations that have declared saturated fat and cholesterol to be dangerously unhealthy for many decades refuse to admit their errors and continue to waffle on the subject.

    This is relevant to computer science because of the stark differences between the two fields. Software is esoteric and invisible to nearly everyone, while the results of eating are tangible to everyone, and the statistics about the effects are visible and measurable. The common factor is … people. In both cases there is a wide consensus of expert opinion about the right way to build and main software, and the right way to eat and live in order to be healthy. Experts! From blood-letting to flying machines, they lead the way!

    Usually the Science-challengers are wrong

    It has taken me a great deal of time to dig in to this scandal, in part because there are so many cases of "the experts are all wrong — me and my fringe group have the truth." I wanted to make absolutely sure "it's good to eat saturated fat" wasn't another of these. After all, the simple notion that eating fat makes you fat makes common sense!

    The Scandal

    Surely the diet recommendations of the major medical and government institutions in favor of limiting fat must be valid! Sadly, this is not the case. Rather, it's a wonderful example of how hard paradigm shifts are to accomplish, particularly when the prestige of major institutions are involved. And, sadly, how prestige and baseless assertions have substituted for science, shockingly similar to bloodletting and other universally-accepted-on-no-objective-basis practices.

    A basic, understandable summary of the subject may be found in The Big Fat Surprise, which is loaded with appropriate detail. Here is a summary:

    "the past sixty years of low-fat nutrition advice has amounted to a vast uncontrolled experiment on the entire population, with disastrous consequences for our health.

    For decades, we have been told that the best possible diet involves cutting back on fat, especially saturated fat, and that if we are not getting healthier or thinner it must be because we are not trying hard enough. But what if the low-fat diet is itself the problem? What if those exact foods we’ve been denying ourselves — the creamy cheeses, the sizzling steaks — are themselves the key to reversing the epidemics of obesity, diabetes, and heart disease?"

    Yes, this sounds like what an anti-science crank would say. All I can say is, dig in. You'll find the shoddy beginnings of the fat-cholesterol-heart hypothesis; the biased studies that seemed to support it; the massive, multi-decade Framingham study which was trumpeted as supporting the anti-fat theory, but whose thoroughly confirmed and vetted results were actively suppressed for many years; the uncontested studies that disprove the anti-fat recommendations; and the improved understanding of the biological systems that thoroughly debunks the widely promoted campaign against saturated fat and LDL, the "bad" cholesterol.

    More detail

    If you want a start on more detail, I recommend Dr. Sebastian Rushworth at a high level and the recent book by long-term cardiac doctor Malcolm Kendrick that gives the details of the studies and biology that explain what really happens.

    Here are a couple explanations from Dr. Rushworth:

    "the LDL hypothesis basically says that heart disease happens because LDL somewhow ends up in the arterial wall, after which it is oxidized, which starts an inflammatory reaction that gradually leads to the hardening of arteries and eventually to bad things like heart attacks and strokes."

    "… the LDL hypothesis is bunk. There is by now a wealth of evidence showing that LDL has little to do with heart disease, such as this systematic review from BMJ Evidence Based Medicine, which showed that there is no correlation whatsoever between the amount of LDL lowering induced by statins and other LDL lowering drugs, and the benefit seen on cardiovascular disease risk (if indeed any benefit is seen – it often isn’t)."

    Rushmore's summary of the Kendrick book is:

    "The ultra-short elevator pitch version of what he argues in the book is that heart disease is what happens when damage to the arterial wall occurs at a faster rate than repair can happen. That’s why everything from sickle cell disease to diabetes to high blood pressure to smoking to rheumatoid arthritis to cortisone treatment to the cancer drug Avastin increases the risk of cardiovascular disease – they all either increase the speed at which the arterial wall gets damaged or slow down its repair. It’s why heart disease (more correctly called “cardiovascular disease”) only affects arteries (which are high pressure systems) and not veins (which are low pressure systems), and why atherosclerosis (the hardening of the arteries that characterizes heart disease) primarily happens at locations where blood flow is extra turbulent, such as at bifurcations.

    This alternative to the LDL hypothesis is known as the “thrombogenic hypothesis” of heart disease. It’s actually been around for a long time, first having been proposed by German pathologist Carl von Rokitansky in the 19th century. Von Rokitansky noted that atherosclerotic plaques bear a remarkable similarity to blood clots when analyzed in a microscope, and proposed that they were in fact blood clots in various stages of repair.

    Unfortunately, at the time, von Rokitansky wasn’t able to explain how blood clots ended up inside the artery wall, and so the hypothesis floundered for a century and a half (which is a little bit ironic when you consider that no-one knows how LDL ends up inside the artery wall either, yet that hasn’t hindered the LDL hypothesis from becoming the dominant explanation for how heart disease happens). We now know the mechanism by which this happens: cells formed in the bone marrow, known as “endothelial progenitor cells”, circulate in the blood stream and form a new layer of endothelium on top of any clots that form on the artery wall after damage – thus the clot is incorporated in to the arterial wall.

    In spite of the fact that probably at least 99% of cardiologists still believe in the LDL hypothesis, the thrombogenic hypothesis is actually supported far better by all the available evidence. While the LDL hypothesis cannot explain why any of the risk factors listed above increases the risk of heart disease, the thrombogenic hypothesis easily explains all of them.

    Conclusion

    Many major institutions have dialed down their fervent condemnation of the low-fat and LDL-is-bad myths, but haven't done what they should do, which reverse their positions and mea culpa. They should at minimum take at least part responsibility for the explosion of obesity, useless pharma mega-dollars wasted, and the attendant health disasters for countless humans. The fact that they're not helps us understand the resistance to correction of the similarly powerful mainstream myths about software. It's not about the LDL or the software; it's about people, pride, institutions, bureaucracy and entrenched practices and beliefs that fight change.

     

  • Computer Science and Kuhn’s Structure of Scientific Revolutions

    If bridges fell down at anywhere close to the rate that software systems break and become unavailable, there would be mass revolt. Drivers would demand that bridge engineers make radical changes and improvements in bridge design and building. If criminals took over bridges and held the vehicles until they paid a ransom anywhere close to the number of times criminals rob organizations or their data or lock their systems until a ransom is paid, there would be mass revolt. In the world of software, this indefensible state of affairs is what passes for normal! Isn't it time for change? Has something like this ever happened in other fields that we can learn from?

    Yes. It's happened enough that it's been studied, and the process of resistance to change until the overwhelming force of a new paradigm breaks through.

    Thomas Kuhn was the author of a highly influential book published in 1962 called The Structure of Scientific Revolutions. He introduced the term “paradigm shift,” which is now a general idiom. Examining the history of science, he found that there were abrupt breaks. There would be a universally accepted approach to a scientific field that was challenged and then replaced with a revolutionary new approach. He made it clear that a paradigm shift wasn’t an important new discovery or addition – it was a whole conceptual framework that first challenged and then replaced the incumbent. An example is Ptolemaic astronomy in which the planets and stars revolved around the earth, replaced after long resistance by the Copernican revolution.

    Computer Science is an established framework that reigns supreme in academia, government and corporations, including Big Tech. There are clear signs that it is as ready for a revolution as the Ptolemaic earth-centric paradigm was. Many aspects of the new paradigm have been established and proven in practice. Following the pattern of all scientific revolutions, there is massive establishment resistance, led by a combination of ignoring the issues and denying the problems.

    The Structure of Scientific Revolutions

    Thomas Kuhn received degrees in physics, up to a PhD from Harvard in 1949. He was into serious stuff, with a thesis called “The Cohesive Energy of Monovalent Metals as a Function of Their Atomic Quantum Defects.” Then he began exploring. As Wiki summarizes:

    As he states in the first few pages of the preface to the second edition of The Structure of Scientific Revolutions, his three years of total academic freedom as a Harvard Junior Fellow were crucial in allowing him to switch from physics to the history and philosophy of science. He later taught a course in the history of science at Harvard from 1948 until 1956, at the suggestion of university president James Conant.

    Structure-of-scientific-revolutions-1st-ed-pb
    His path for coming to his realization is fascinating. I recommend reading the book to anyone interested in how science works and the history of science.

    After studying the history of science, he realized that it isn't just incremental progress.

    Kuhn challenged the then prevailing view of progress in science in which scientific progress was viewed as "development-by-accumulation" of accepted facts and theories. Kuhn argued for an episodic model in which periods of conceptual continuity where there is cumulative progress, which Kuhn referred to as periods of "normal science", were interrupted by periods of revolutionary science. The discovery of "anomalies" during revolutions in science leads to new paradigms. New paradigms then ask new questions of old data, move beyond the mere "puzzle-solving" of the previous paradigm, change the rules of the game and the "map" directing new research.[1]

    Real-life examples of this are fascinating. The example often given is the shift from "everything revolves around the earth" to "planets revolve around the sun." What's interesting here is the planetary predictions of the Ptolemaic method were quite accurate. The shift to Copernicus (Sun-centric) didn't increase accuracy, and the calculations grew even more complicated. The world was not convinced! Kepler made a huge step forward with elliptical orbits instead of circles with epicycles and got better results that made more sense. The scientific community was coming around. Then when Newton showed that Kepler's laws of motion could be derived from his core laws of motion and gravity the revolution won.

    While the book doesn't emphasize this, it's worth pointing out that the Newtonian scientific paradigm "won" among a select group of numbers-oriented people. The public at large? No change.

    Anomalies that drive change

    One of the interesting things Kuhn describes are the factors that drive a paradigm shift in science — anomalies, results that don't fit the existing theory. In most cases, anomalies are resolved within the paradigm and drive incremental change. When anomalies resist resolution, something else happens.

    During the period of normal science, the failure of a result to conform to the paradigm is seen not as refuting the paradigm, but as the mistake of the researcher, contra Popper's falsifiability criterion. As anomalous results build up, science reaches a crisis, at which point a new paradigm, which subsumes the old results along with the anomalous results into one framework, is accepted. This is termed revolutionary science.

    The strength of the existing paradigm is shown by the strong tendency to blame things on mistakes of the researcher — or in the case of software, on failure to follow the proper procedures or to write the code well.

    The Ruling Paradigm of Software and Computer Science

    There is a reigning paradigm in software and Computer Science. As you would expect, the paradigm is almost never explicitly discussed. It has undergone some evolution over the last 50 years or so, but not as radical as some would have it.

    At the beginning, computers were amazing new devices and people programmed them as best they could. Starting over 50 years ago, people began to notice that software took a long time and lots of effort to build and was frequently riddled with bugs. That's when the foundational aspects of the current paradigm were born and started to grow, continuing to this day:

    1. Languages should be designed and used to help programmers avoid making mistakes. Programs should be written in small pieces (objects, components, services, layers) that can be individually made bug-free.
    2. Best-in-class detailed procedures should be adapted from other fields to assure that the process from requirements through design, programming, quality assurance and release is standardized and delivers predictable results.

    The ruling paradigm of software and computer science is embodied in textbooks, extensive highly detailed regulations, courses, certifications and an ever-evolving collection of organizational structures. Nearly everyone in the field unconsciously accepts it as reality.

    Are There Anomalies that Threaten the Reigning Paradigm?

    Yes. There are two kinds.

    The first kind are the failures of delivery and quality that continue to plague by-the-book software development, in spite of decades of piling up the rules, regulations, methods and languages that are supposed to make software development reliable and predictable. The failures are mostly attributed to errors and omissions by the people doing the work — if they had truly done things the right way, the problems would not have happened. At the same time, there is a regular flow of incremental "advances" in procedure and technology designed to prevent such problems. This is textbook Kuhn — the defenders of the status quo attributing issues to human error.

    The second kind of anomalies are bodies of new software that are created by small teams of people who ignore the universally taught and proscribed methods and get things done that teams 100's of times larger couldn't do. Things like this shouldn't be possible. Teams that ignore the rules should fail — but instead most of the winning teams are ones that did things the "wrong" way. This is shown by the frequency of new software products being created by such rule-ignoring small groups, rocketing to success and then being bought by the rule-following organizations, including Big Tech, who can't do it — in spite of their giant budgets and paradigm-conforming methods. See this and this.

    When will this never-ending stream of paradigm-breaking anomalies make a paradigm-shifting revolution take place in Computer Science? There is no way of knowing. I don't see it taking place any time soon.

    Conclusion

    The good news about the resistance of the current consensus in Computer Science and software practice to a paradigm shift is that it provides the room for creative entrepreneurs to build new things that meet the unmet needs of the market  The entrepreneurs don't even have to go all-in on the new software paradigm! They just need to ignore enough of the bad old stuff and use enough of the good new stuff to get things done that the rule-followers are incapable of. Sadly, the good news doesn't apply to fields that are so outrageously highly regulated that the buyer insists on being able to audit compliance during the build process. Nonetheless, there is lots of open space for creative people to build and grow.

  • The Dimension of Software Automation Breadth

    Computers and software are all about automation. See this for the general principles of automation. When you dive into details and look at lots of examples, patterns emerge. The patterns amount to sequences or stages of automation that emerge over time, with remarkable consistency. When you apply your knowledge of the pattern to the software used in an industry at any given time, you can identify where the software is in the sequence. You can confirm the earlier stage in the sequence and predict with accuracy the stage that will follow. This gives you ability to say what will happen, though not when or by whom.

    The patterns of automation play out in multiple dimensions. I have described what may be called the depth of automation, in which software evolves from recording what people do through helping them and eventually to replacing them.

    In this post I will describe another dimension, which may be thought of as the breadth of automation. The greater the automation breadth the more functions are incorporated into the automation software and the more highly integrated the functions are with each other.

    The Dimension of automation breadth

    Automation breadth has these basic levels, independent of the automation depth of the products involved:

    Component

    Not a stand-alone product, but a component that could be incorporated into many custom applications and/or products

    Point product

    Implements a single function

    Product collection

    A group of point products from a single vendor

    Product suite

    An integrated set of separate products, with meaningful benefits from the integration

    Integrated product

    A single body of source code that performs a variety of related functions that would otherwise require separate products, with meaningful benefits from the unification

    Integrated product with selective outsourcing

    A product that is written and delivered in such a way that the using organization can choose to have the vendor staff a number of functions off-site.

    Components rarely appear first in historical terms, but they are the beginning of this sequence. Typically, functionality that is very difficult to write or whose requirements change rapidly is separated out and delivered in the form of a component. The quality of the component may become so important that it becomes an industry standard.

    Example: The Ocrolus service (disclosure: Oak HC/FT is an investor) is a classic example of a valuable, narrow component. It takes images of documents of nearly any kind, recognizes them, extracts their data and returns the data to the component user. This functionality is so challenging to write, and the consequence of errors so great, that most applications that need the functionality are likely to use the component, which is delivered as a service, rather than write their own.

    In a time of rapidly emerging functionality, point products are usually first, because they can be gotten to market quickly. Buyers typically talk in terms of “best of breed,” but have the problem of negotiating and maintaining relationships with multiple vendors, who frequently have conflicting interests. The buyer also has to take responsibility for integrating the various products he has bought. Buyers reasonably worry about whether their typically small vendor will stay in business and continue to invest in their product.

    Example: Captura (now part of Concur) provided a point product to enable a company to automate the process of entering, approving and paying expense reports.

    Product collections solve some of these problems. There is now a single vendor; the vendor is probably much larger and more likely to stay in business; the products will be maintained and there is no conflict of interest. Once a product collection becomes available in a category, buyers will typically prefer an adequate product from a collection to a superior point product. Product collections can be formed by acquisition.

    Example: CA, Computer Associates, is a typical vendor that acquires products in a category and puts them together into a product collection.

    It is often desirable to share data among the products in a collection. Frequently, they maintain copies of essentially the same information, have essentially the same security roles defined, etc. Users have to go through considerable time and effort to accomplish this on their own, and then again when there is a new release, and desirable integrations are sometimes not even possible. Product suites solve these problems to a large extent, since a single vendor performs the integration and sells the integration as part of the product collection. Once a product suite becomes available in a category, buyers will typically prefer an adequate product suite to a product collection whose products are superior, because the cost of installation and maintenance is lower and the benefits of integration often outweigh the benefits of individual product features. Building a product suite typically requires source-code level coding and functionality design changes, but the code bases of the products can be separate.

    Classic example: Microsoft Office was one of the first product suites for office products. While the benefits of integration are not overwhelming in this functional area, users clearly benefit from having a suite rather than individual products. Once the benefits of a suite became accepted, buyers no longer wanted individual office products.

    Recent example: The market for business formation has long been dominated by the point product Legalzoom. A new, rapidly growing product suite called Zenbusiness (disclosure: Oak HC/FT is an investor) is taking classic advantage of moving to the next stage of automation breadth, by offering small business customers not only business formation services, but also services for websites, banking and accounting.

    Not all functionality areas benefit from having an integrated product, but for those that do, integrated products are better for vendors (reduced costs due to elimination of redundancy) and for buyers (a simpler, more unified product that is easier to learn and operate, and has deep, fully-automated function integration), and typically win in the marketplace.

    Example: Everyone thinks of SAP’s R3 as being the first client/server ERP product, but its real break-through was being the first truly integrated product on which a large enterprise could run its business. Using a single body of code and a common shared DBMS, its many modules each ran different parts of the enterprise business. In spite of the high cost of implementation and operation, the advantages of running your business on a single integrated product were overwhelming. Like most projects to build a unified product, the vendor had deep domain knowledge, it took a long time to get it right, and the early installations were painful.

    Once functionality markets reach a level of maturity, the competition becomes intense, and organizations often have to choose areas of distinction, outsourcing the functions they choose not to compete in to a low-cost provider. Products that enable organizations to selectively outsource in this way typically take business from products that must be fully staffed by the buyer.

    Example: FDC is the leading processor for credit cards in the US. With a billion cards outstanding, the market is huge and highly competitive. While the technology base of the FDC product is decades old, the functionality it delivers is highly sophisticated. In addition to delivering a completely integrated, multi-department product, FDC offers off-site staffing for the various departmental functions, where the staff sounds on the telephone as though they worked for you.

    Conclusion

    The dimension of automation breadth helps us understand the evolution of software and the progression that naturally takes place in a given market space. The companies that win are most often the ones that dominate a narrow market segment with a particular product and then broaden the range of software they can sell to a given organization. They typically move step by step according to the progression I have described here.

  • The Destructive Impact of Software Standards and Regulations

    In many fields of life, standards and regulations are a good thing. Standards are why you can get into a new car and be comfortable driving it; without standard steering wheels and brake pedals, no one would be able to drive a rental car. Software is different. The misguided effort to impose standards and regulations on software development has played a key role in the nonstop cybersecurity disasters and software failures that most organizations try to minimize and ignore.

    Do software standards and regulations literally cause software and cybersecurity disasters? Yes, in much the same way as looking at your cell phone while driving causes auto accidents. Everyone agrees that distracted driving is bad, but somehow distracted programming is ignored. It’s a classic case of ignorance and good intentions that have horrible unintended consequences. Seeing the bad consequences, the community of standards writers believes the problem is that their standards aren’t deep, broad and detailed enough — let’s distract the programmers even more!

    Software and Bicycles

    Suppose that writing a software program were like riding a bicycle, and that finishing writing the program was like reaching your destination on the bicycle.

    Even if you’re not a bike rider, you’ve likely seen lots of bikes being ridden. You’ve probably seen kids learning on training wheels:

    Youth+bike+riders+MS

    Kids eventually learn to balance and learn how to handle curves, hills and the rest. Then there are serious riders whose bodies are tuned to the task. They ride with focus and concentration, assuring that they handle every detail of the road they’re on to maximum advantage.

    11

    Because people who create standards and regulations on software are appallingly ignorant of creating code with maximum speed and quality, they impose all sorts of strange requirements on programmers. It’s as though the bicycles were invisible to everyone but the programmers, but the leaders have it on best authority that the bicycles they demand their programmers use are up to the most modern standards.

    They create increasingly elaborate processes and methods that are supposed to assure that bicycles reach their destinations quickly and safely, but in fact assure the opposite. The programmers are required to juggle a myriad of meetings, planning discussions, reviews and other activities while still making great progress.

    The oh-so-careful regulators go to great lengths to assure that at each stage of the bicycle’s journey the rider does his riding flawlessly and without so much as a swerve from the proscribed path. When developers are forced to follow regulations and standards, they don’t just pedal, quickly and smoothly, to the finish line. They constantly stop and engage in myriad non-programming activities. No sooner do they start to pick up speed after being allowed back on the bike than ring! ring! It’s time for the security review meeting!

    Riding a bicycle competitively is not the most intellectual of activities. Neither is being a batter in baseball. But both require exquisitely deep focus and concentration. The batter’s head can’t be swimming with advice about what to do when the pitch is a sinker; the batter has to be present in the moment and respond to the pitch in real time with the evolving information he gets as the pitch approaches the plate.

    In the same way and arguably even more so, the programmer has to be immersed in the invisible evolving “world” of the software around him, seeing what should to added or changed to what, where in that world. The total immersion in that world isn’t something that can be flicked on with a switch — though it can be brought crashing down in an instant by an interruption. In the same way, a biker doesn’t get to maximum speed and focus in seconds. It takes time to sink in to the flow.

    Meanwhile, all the managers judge programmers primarily by how well they juggle, like the skilled fellow in the picture above, blissfully unaware and seemingly uncaring that the unicycle is better for stopping, starting and going in circles than it is for making forward progress.

    Conclusion

    This is the ABP (Anything But Programming) factor in software — make sure the programmers spend their time and energy on things other than driving the program to its working, useful, high-quality goal. The managers feel great about themselves. They are following the latest standards, complying with all the regulations, and assuring that the programmers under their charge are doing exactly and only the right thing — doing it right the first time. When such managers get together with their peers, they exchange notes about which aspects of the thousands and thousands of pages of standards they have forced their programmers to distract themselves with recently. Because that’s what modern software managers do.

  • The promising origins of object-oriented programming

    The creation of the original system that evolved into the object-oriented programming (OOP) paradigm in design and languages was smart. It was a creative, effective way to think about what was at the time a hard problem, simulating systems. It’s important to appreciate good thinking, even if the later evolution of that good thinking and the huge increase in the scope of application of OOP has been problematic.

    The evolution of good software ideas

    Lots of good ideas pop up in software, though most of them amount to little but variations on a small number of themes.

    One amazingly brilliant idea was Bitcoin. It solved a really hard problem in creative ways, shown in part by its incredible growth and widespread acceptance. See this for my appreciation of the virtues of Bitcoin, which remains high in spite of the various things that have come after it.

    The early development of software languages was also smart, though not as creative IMHO as Bitcoin. The creation of assembler language made programming practical, and the creation of the early 3-GL’s led to a huge productivity boost. A couple of later language developments led to further boosts in productivity, though the use of those systems has gradually faded away for various reasons.

    The origin of Object-Oriented Programming

    Wikipedia has a reasonable description of the origins of OOP:

    In 1962, Kristen Nygaard initiated a project for a simulation language at the Norwegian Computing Center, based on his previous use of the Monte Carlo simulation and his work to conceptualise real-world systems. Ole-Johan Dahl formally joined the project and the Simula programming language was designed to run on the Universal Automatic Computer (UNIVAC) 1107. Simula introduced important concepts that are today an essential part of object-oriented programming, such as class and object, inheritance, and dynamic binding.

    Originally it was built as a pre-processor for Algol, but they built a compiler for it in 1966. It kept undergoing change.

    They became preoccupied with putting into practice Tony Hoare's record class concept, which had been implemented in the free-form, English-like general-purpose simulation language SIMSCRIPT. They settled for a generalised process concept with record class properties, and a second layer of prefixes. Through prefixing a process could reference its predecessor and have additional properties. Simula thus introduced the class and subclass hierarchy, and the possibility of generating objects from these classes.  

    Nygaard was well-rewarded for the invention of OOP, for which he is given most of the credit.

    What was new about Nygaard’s OOP? Mostly, like Simscript, it provided a natural way to think about simulating real-world events and translating the simulation into software. As Wikipedia says:

    The object-oriented Simula programming language was used mainly by researchers involved with physical modelling, such as models to study and improve the movement of ships and their content through cargo ports

    In some (not all) simulation environments, thinking in terms of a set of separate actors that interact with each other is natural, and an object system fits it well. Making software that eases the path to expressing the solution to a problem is a clear winner. Simula was an advance in software for that class of applications, and deserves the credit it gets.

    What should have happened next

    Simula was an effective way to hard-code a computer simulation of a physical system. The natural next step an experienced programmer would take would be to extract out the description of the system being modeled and express it in easily editable metadata. This makes creating the simulation and making changes to it as easy as making a change to an Excel spreadsheet and clicking recalc. Here's the general idea of moving up the tree of abstraction. Here's an explanation and illustration of the power of moving concepts out of procedural code and into declarative metadata.

    This is what good programmers do. They see all the hard-coded variations of general concepts in the early hard-coded simulation programs. They might start out by creating subroutines or classes to express the common things, but you still are hard-coding the simulation. So smart programmers take the parameters out of the code, put them into editable files and assure that they're declarative. Metadata! There are always exceptions where you need a calculation or an if-then-else — no problem, you just enhance the metadata so it can include "rules" and you're set.

    The Ultimate Solution

    As you take your simulation problem and migrate it from hard-coded simulation to descriptive, declarative metadata, the notion of "objects" with inheritance begins to be a value-adding practicality … in the metadata. You understand the system and its constraints increasingly — for example ships in cargo ports.

    Then you can take the crucial next step, which is worlds away from object-orientation but actually solves the underlying problem in a way simulation never can — you build a constraint-based optimization model and solve it to make your cargo flows the best they could theoretically be!

    People thinking simulation and programmers thinking objects will NEVER get there. They're locked in a paradigm that won't let them! This is the ultimate reason we should never be locked into OOP, and particularly locked into it by having the object-orientation concepts in the language itself, instead of employed when and as needed in a fully unconstrained language for instructions and data.

    The later evolution of OOP

    In spite of the power of moving from hard-coding to metadata, the language-obsessed elite of the computing world focused on the language itself and decided that its object orientation could be taken further. Simula’s object orientation ended up inspiring other languages and being one of the thought patterns that dominate software thinking. It led directly to Smalltalk, which was highly influential:

    Smalltalk is also one of the most influential programming languages. Virtually all of the object-oriented languages that came after—Flavors, CLOS, Objective-C, Java, Python, Ruby, and many others—were influenced by Smalltalk. Smalltalk was also one of the most popular languages for agile software development methods, rapid application development (RAD) or prototyping, and software design patterns.The highly productive environment provided by Smalltalk platforms made them ideal for rapid, iterative development.

    Lots of the promotional ideas associated with OOP grew along with Smalltalk, including the idea that objects were like Lego blocks, easily assembled into new programs with little work. It didn’t work out for Smalltalk, in spite of all the promotion and backing from influential players. The companies and the language itself dwindled and died.

    It was another story altogether for the next generation of O-O languages, as Java became the language most associated with the internet as it grew in the 1990’s. I will tell more of the history later. For now I’ll just say that OOP is the paradigm most often taught in academia and generally in the industry.

    Conclusion

    OOP was invented by smart people who had a new problem to solve. Simula made things easier for modeling physical systems and was widely used for that purpose. As it was expanded and applied beyond the small category of problems for which it was invented, serious problems began to emerge that have never been solved. The problems are inherent in the very idea of OOP. The problems are deep and broad; the Pyramid of Doom is one of seemingly endless examples.

    When OOP principles and languages were applied to databases and GUI’s, they failed utterly, to the extent that even the weight of elite experts couldn’t force their use instead of simple, effective approaches like RDBMS for database and javascript for GUI’s. OOP has evolved into a fascinating case study of the issues with Computer Science and the practice of software development, with complete acceptance in the mainstream along with widespread quiet dissent resulting from fraudulent claims of virtue.

  • Deep-Seated Resistance to Software Innovation

    Everyone says they're in favor of innovation. Some organizations promote innovation with lots of publicity. Many organizations even have CIO's (Chief Innovation Officers) to make sure it gets done. But the reality is that resistance to innovation runs strong and deep in organizations; the larger the organization, the greater the resistance usually is. The reason is simple: innovation threatens the power and position of people who have it. They feel they have nothing to gain and much to lose.

    It's not just psychology. Innovation resistance throws up barriers that are thick and high. See this for examples.

    A good way to understand the resistance is look at sound technologies that have been proven in practice that could be more widely applied, but are ignored and/or actively resisted by the organizations that could benefit from them. I have called these In-Old-vations. Here is an innovation that is still waiting for its time in the sun, and here's one that's over 50 years old that is still being rolled out VERY slowly.

    In this post I will illustrate the resistance to technology innovation in a little-known extreme example: the people in charge of Britain's war effort resisted innovations that could help them win the war. They were literally in war and losing and decided, in effect, that they'd rather lose. Sounds ridiculous, I know, but this is normal behavior of people in organizations of all kinds.

    The Battle of Britain

    Britain was at war, facing Hitler’s much larger, better-prepared military, who had already rolled over its adversaries. Life and death. Literally. The established departments did all they could do defend from attacks. The so-called Battle of Britain is well-known. What is not as widely known is the battle in the seas. German submarines were sinking British ships at an alarming rate. The Navy had no answers other than to do what they were already doing harder.

    The situation was desperate. If there was ever a time to "think outside the box" it would seem this was it. The response of the Navy to new things? NO WAY. Amazing new weapons developed by uncertified people outside the normal departmental structures? NO WAY. Once those weapons are built and proven, use them to stop the submarines that were destroying boats and killing men by the thousands? NO WAY!!

    Of course, you might think that someone would have known that the fairly recent great innovation in flying machines was achieved by "amateurs" flying in the face of the establishment and the acknowledged expert in flying as I describe here. You might think that Navy men would remember that perhaps the greatest innovation in naval history was invented by Navy-related people. But no. Protecting our power and the authority of our experts is FAR more important than a little thing like losing a war!

    The story of the new way to fight submarines is told in this book:

    Churchills

    Someone who was not part of the Navy establishment invented a whole new approach to fighting submarines. The person wasn't a certified, official expert. He was rejected by all relevant authorities and experts. Fortunately for the survival of England, Churchill made sure the concept was implemented and tested. The new devices were delivered to a ship.

    This all took time and it was not until the spring of 1943 that the first Hedgehogs were being installed on Royal Navy vessels. When Commander Reginald Whinney took command of the HMS Wanderer, he was told to expect the arrival of a highly secret piece of equipment. ‘At more or less the last minute, the bits and pieces for an ahead-throwing anti-submarine mortar codenamed “hedgehog” arrived.’ As Whinney watched it being unpacked on the Devonport quayside, he was struck by its bizarre shape. ‘How does this thing work, sir?’ he asked, ‘and when are we supposed to use it?’ He was met with a shrug. ‘You’ll get full instructions.' Whinney glanced over the Hedgehog’s twenty-four mortars and was ‘mildly suspicious’ of this contraption that had been delivered in an unmarked van coming from an anonymous country house in Buckinghamshire. He was not alone in his scepticism. Many Royal Navy captains were ‘used to weapons which fired with a resounding bang’, as one put it, and were ‘not readily impressed with the performance of a contact bomb which exploded only on striking an unseen target’. They preferred to stick with the tried and tested depth charge when attacking U-boats, even though it had a hit rate of less than one in ten. Jefferis’s technology was too smart to be believed.

    Here's what the new mortars looked like:

    450px-Hedgehog_anti-submarine_mortar

    What happened? It was transformative:

    Over the course of the next twelve days, Williamson achieved a record unbeaten in the history of naval warfare. He and his men sank a further five submarines, all destroyed by Hedgehogs. Each time.

    If resistance to change to true technological innovation is so strong when you’re desperate, literally at death’s door, how do you think it’s going to be in everyday life? The rhetoric is that we all we love innovation! The reality is that anything that threatens anybody or anything about the status quo is to be ignored, shoved to the side and left to die. Anyone who makes noise about it obviously isn’t a team player and should find someplace to work where they’ll be happier. And so on.

    Conclusion

    Innovation happens. Often nothing "new" needs to be invented — "just" a pathway through the resistance to make it happen. Here is a description of the main patterns followed by successful innovations. If you have an innovation or want to innovate, you should be aware of the deep-seated resistance to innovation and rather than meeting it head-on, craft a way to make it happen without head-on war. Go for it!

  • Franz Liszt: 210

    This week is the 210th anniversary of the birth of Franz Liszt on Oct 22, 1811. I wrote a couple highlights about Liszt for his 200th anniversary. Aside from loving his achievements in life, he was the inspiration for this blog.

    440px-Liszt-1870

    Liszt was one of those rare people who was world-class brilliant and also appreciated the valuable work of others, supporting that work in word and deed.

    One way he demonstrated this was by championing the work of other composers by transcribing their work and performing it for audiences who might otherwise not have a chance to hear it. He made amazing transcriptions of operas and many other kinds of works. Probably the most impressive transcriptions he made were making piano versions of all nine Beethoven symphonies!

    Oddly enough, the symphony transcriptions were largely ignored in the world of music, even by Liszt's pupils. It wasn't until Glenn Gould recorded a couple of them in 1967 that they began to appear on the scene. But they remain obscure, even among classical music lovers.

    I thought I knew all the Beethoven symphonies, but hearing a Liszt piano transcription performed was like hearing it for the first time. After a great deal of struggle he finally managed to transcribe the fourth (choral) movement of the ninth symphony. And did it again for duo pianos.

    Frederic Chiu has recorded the 5th and 7th Symphonies for Centaur Records. In his program notes he suggests that "Liszt's piano scores must therefore be taken as a sort of gospel in regards to Beethoven's intentions with the Symphonies" because of Liszt's unique perspective, having met Beethoven in person, having heard collaborators and contemporaries of Beethoven perform the Symphonies, having studied and performed the works both as a pianist/transcriber and as a conductor in Weimar. No one in history could claim to have as much exposure, insight and journalistic integrity as regards Beethoven's intentions around the Symphonies.

    I personally recommend the recordings by Konstantin Scherbakov. You will be listening to the unique inspiration of Beethoven, further enhanced by the incredible genius of Liszt.

    Listening to these transcriptions can pull you out of normal physical space into a dimension filled with drama, beauty and inspiration, one that, unlike the austere, timeless beauty of math, drives through time from a start to a conclusion.

  • Software Programming Language Evolution: Libraries and Frameworks

    We've talked about the major advances in programming languages and the enhancements that brought those languages to a peak of productivity — a peak which has not been improved on since. Nonetheless there has been an ongoing stream of new programming languages invented, each of which is claimed to be "better" — with no discussion about what constitutes "goodness" in a language! This is high on the list of causes of the never-ending chaos and confusion that pervades the world of software. While loads of people focus on language, the most important programming tools that make HUGE contributions to the productivity of programmers using them goes largely unremarked, in spite of the fact that they are part of every programmer's day-to-day programming experience. What are these essential, always-used but largely in the background thing? Libraries and frameworks.

    Libraries

    A library is a collection of subroutines, often grouped by subject area, which perform commonly used functions for programs. Every widely used language has a library that is key to its practical use.

    For example, the C language has libraries that contain hundreds of functions for

    • input and output, including formatting
    • string manipulation and character testing
    • math functions
    • standard functions, dozens of them
    • date and time

    There are libraries of functions for controlling displays and input devices and many other things. After all these years and the "fatal" flaw of not being object-oriented (gasp!!), it is currently the #2 language in popularity. While the libraries don't play as important part of its popularity as certain other languages, they are still important.

    Python became widely used among people doing analytics not particularly because of the virtues of the language itself, but because it grew one of the richest modern libraries of routines for calculations available. It ended up with great support for statistics and the things you need to do with large number sets including array and matrix manipulation. When doing analytics, the library functions do nearly all the heavy lifting. All that the program has to do is deploy the rich set of functions against the problem in the right way!

    A 2021 survey of language popularity confirms the popularity of Python, and the importance of its library in giving it its popularity. Here's a summary:

    Capture

    Have you ever heard of the language R? It's an open-source language which became usable in the year 2000. Chances are if you work in statistics, data analytics, operations research or other areas involving serious data manipulation you know all about it and probably use it. The language itself has valuable features for math programming that most languages don't have, for things like vectors and matrices. More importantly it has an amazing rich library of the R equivalent of libraries, called packages. R packages contain reusable collections of functions and data definitions to perform valuable calculations. When you're trying to do something you're pretty sure someone else has tried to tackle in some way, the first thing you do is look for an appropriate package. Packages are available for most common data science tasks; they do much of the work and nearly all the "heavy lifting" of performing any job for which the package applies. R is a prime example of a language having virtues, but the bulk of the value and productivity coming from the available libraries.

    One of the sad turns taken in programming language evolution was driven by the ascendancy of the modern RDBMS. Ironically, the DBMS emerged at about the same time as do-everything language environments dominated the new minicomputer landscape. With a single environment like MUMPS or PICK you could get a whole job done — user interface, core program, data storage and access, everything! These environments enabled massive productivity gains. See this for more. But the DBMS blasted in, took over, and HAD to be used. So 3-GL's that were less productive than the new environments became even less so by finding cumbersome ways to use DBMS's. The way they did it I explain here. This was a first major step down the degenerate, productivity-killing road of layers. How did 3-GL's pull off the integration? With libraries, of course! For example, Java's JDBC library became a requirement for burdensome, error-prone access to standard databases from within Java, while those who didn't mind the overhead and "impedance mismatch" could use one of the ORM's (Object-Relational Mapping systems) that emerged.

    Not all libraries are tightly associated with a language! One such library is redis.io, an open source project started by an Italian developer to build a powerful in-memory key/value datastore and cache. It has taken off and has many more powerful features including queuing. While unknown among non-programmers, it is incredibly popular, widely used and available on the major cloud providers.

    Other libraries are directed towards a particular aspect of programming. With the emergence of the web UI, a succession of libraries to ease their creation have emerged.

    A UI library that has become dominant is React, created and then open-sourced by Facebook. It is specifically for the Javascript language and greatly enhances the productivity of building and the resulting performance of web UI's. The benefits of javascript or any other language are trivial compared to the productivity gain produced by using React. Unlike most libraries, React comes close to being a framework; it resembles an R package, in that it's a comprehensive solution to the problem of building web UI's. Interestingly, part of the programmer productivity gain is due to the fact that it has a major declarative aspect to its design, like AngularJS (see below).

    Bottom line: while the details of software language features have some impact on productivity, the availability and richness of libraries enhances productivity many times over. It's no contest.

    Frameworks

    Frameworks take the idea of libraries but with an important reversal. Libraries are sets of routines, any of which may be called by a program written in a supported language. The program written in the language is completely in charge. Frameworks, by contrast, provide an environment in which a language can operate. You select exactly one framework to work in. When you play by its rules, you normally enjoy large productivity benefits.

    A library is a rich set of resources covering many issues, like a book library. A framework is a selected set of resources enabling rapid work on a given category of issues, like a kitchen for cooking.

    One impressive framework is the RAILS framework for the language Ruby. While the Ruby language had become fairly successful, Ruby on Rails (as it was called) established it as a player because many groups discovered that it was many times faster than other tools at building a web application that has a UI and a database. The reason is simple: normally to build such an application you have one expert creating the UI probably using javascript and some library, another one building the server-side business logic, and another one creating the DBMS — with a great deal of work and trouble relating the database to the server program; see for example JDBC and ORM's above. With RAILS, you defined your database, which automatically defined the names you use inside Ruby to access the data. Similar story with the UI.

    The philosophy of RAILS centered on the DRY principle — Don't Repeat Yourself.

    Capture

    Unlike the endless repetition of the usual layered environment, RAILS took a major step towards the principle of Occamality. Was RAILS original, a real break-through? In its object-oriented environment, yes. In programming in general, no. RAILS is a typical example of the appalling lack of knowledge of history in software. Most of the benefits of RAILS were delivered in a more integrated way many years before by the Powerbuilder environment. I discuss the context here.

    What comprehensive frameworks like this REALLY do is attempt to re-create the comprehensive, everything-in-one-place development environments that emerged to enhance programmer productivity beyond what was achievable in 3-GL's, mostly by incorporating data storage and user interface functions, as I explain here. Their emergence and widespread adoption is clear evidence of the power of eliminating the insanity of layers in software.

    There are also frameworks that are much narrower in scope, addressing only part of the programming puzzle. While they leave much of an overall programming job alone, they can give a major boost to the narrow area on which they focus. One that is focused exclusively on building web UI's  that caught on and became widely used is AngularJS, a new version of which is simply Angular. This framework is highly declarative, focused more on describing the elements of the UI than the actions required to implement it.

    Capture

    This led to HUGE programmer productivity gains. Why would anyone build a web UI from scratch? Nonetheless, the similarities between the ReactJS library and the AngularJS framework are strong, and both of them powered huge programmer productivity gains that had very little to do with the virtues (or lack thereof) of the associated language, javascript.

    Languages vs Libraries and Frameworks

     New languages continue to flow out of the creative minds of groups of programmers who obviously don't have enough to do to keep themselves fully occupied. Each new language tends to be lauded, if only by its creators, with extreme claims of virtue on many dimensions. None of these claims are ever subjected to verification and testing, much less of the rigorous kind. In any case, there is simply no contest between the gains delivered by a rich library or framework and a new programming language.

    I'll just give glaring example. One of the main virtues that Object-oriented languages are supposed to have is code reuse. The concept that is bandied about is that good class system is like a set of Lego blocks, enabling new programs to be easily assembled. Mostly it doesn't happen. For re-use, libraries and frameworks are the gold standard. Think of a normal, old-fashioned book library. The whole reason they exist is that people reuse books! It's the same thing for software libraries — code gets into software libraries because it's re-used often! That's how they got to be called "libraries." Duh.

     

  • What is technical due diligence for Venture Capital?

    I’m Technology Partner in a VC firm, Oak HC/FT. I help look at companies we’re considering investing in, and I help out as needed after investment. Recently I’ve been thinking about what I do, mostly because I’ve encountered multiple cases of technology due diligence performed on companies I know well. To put it plainly, I don’t do it the same way.

    The firms whose work I’ve seen are professional, well-regarded groups. Their work fits into a pattern of similar work I’ve seen over many years. Their work can have value.

    My Evolution in Tech Due Diligence

    When I started performing tech due diligence for the VC firm Oak Investment Partners in 1992, I had no specific idea of how to do it. My background had been almost exclusively on the “other side,” i.e., working and building software, mostly for small, entrepreneurial companies. But I’d had occasion to dive into bodies of software over the years and figure them out, so I guessed I’d be able to figure it out better than any non-programmer could. All I can say is, I went through quite an evolution from there.

    The starting point was clear and simple: evaluate the software. There’s a lot of work involved and not many people have the skills to do it well, but there’s no magic. I quickly developed an outline for what an evaluation consists of. It was a few pages long. Here’s the last part:

    1

    The earlier parts … well, let’s just say it was comprehensive.

    I marched forward doing the work and interacting with the people in the company being evaluated and with the partners in the VC firm. I evolved my views of tech due diligence.

    In the late 1990’s, I did an on-site tech diligence of a little company called Inktomi. It was started by a couple guys from U Cal Berkeley, one a young professor and the other a grad student. They were hard at work on a project called NOW – the network of workstations, which was an early attempt to build software that would enable a bunch of workstations connected by fast networking to solve really big computing problems, of the kind normally only so-called super-computers could handle. They built the software, and needed a big, big problem to solve to demonstrate that it worked. There were already full-text databases around that ran on standard computers. Also at the time, the internet was exploding, with no end in sight. There was an emerging need to help people find things on the internet. At the time, the best available solutions were “portals,” of which Yahoo was a leading example. A portal was a densely populated page managed by people who would put up links to the main things people were interested in. But what if you wanted more? Wouldn’t it be great to have a full-text search engine that could index and search the entire internet, even as it grew endlessly?

    This is the problem Inktomi took on. No existing search engine could come within a factor of 100 of indexing and searching the whole internet at the time. The boys at Inktomi used their NOW infrastructure to attack the problem, first building a crawling and index-building system, then building the search. It wasn’t easy, but it turns out that it fit right into the NOW approach. If you understood it, you could see that by adding workstations to the NOW, you could scale the index endlessly, no matter how large the internet grew.

    I walked into the second-floor warren of offices above stores on Shattuck Ave. in Berkeley, with machines and people crammed in anywhere they would fit. I already had heard the basic idea. I looked over people’s shoulders at screens, and noticed some things about the C code there. I asked questions and discovered they had implemented parallelism not by using standard UNIX multi-tasking, but by a super-low-overhead coding method. I saw the cables and boards, and found out the clever things they had done to minimize inter-workstation latency. And some more stuff.

    Before long, and without doing anything like the exhaustive inventory of my original approach to due diligence, I had discovered the originality of their approach and their impressive implementation of it, which would give them a lead over any competitors that would emerge. I became a strong advocate of the deal.They took our money, the largest investment we had made in a company to that date, and ended up making over 100X return.

    Of course today, we don’t “inktomi” for things on the internet, we “google.” That’s a story for another time, and happened after Inktomi had attained a public market valuation in excess of $10 Billion.

    I tell the story to illustrate one of the fulcrum points of my evolution in performing tech due diligence. The way I did it and the resulting win helped spur me on to doing what the VC really needs, which is to leverage my experience and skills into getting insights into a potential deal that are invisible to the other members of the firm because of their lack of detailed knowledge of software and systems. The insights can range from “makes no difference here” to “OMG negative” to “how could I have known that super-positive.”

    What I finally evolved to, and continue to work at, revolves around this simple fact: the VC firm wants to know if this technology, the way it is today and is likely to grow into the future, will make the company into a success and end up making money for us and our investors. That’s it! Everything else is a footnote or appendix.

    The vast majority of tech due diligence I see performed by firms who specialize in performing that work, and who are regularly retained by groups doing investments and/or acquisitions, is not about this. What I almost universally see is something like what I started with long ago, with this very important added element: how do the tech, the people, the organization, the process, the deployment and the architectural choices made by the firm compare to widely held industry standards, best practices and regulations? When I look back at my old outline, I see this element entirely missing! Thinking back, I realize that the reason is that I had already decided that those widely held standards, unlike the ones in law and accounting, were a pile of crap. I had already noticed what I later began to systematically study and write about, the difference between the "war time" software practices of winning startups, compared to the widely accepted "peace time" methods.

    Conclusion

    If you want to acquire a tech-oriented company, you naturally want to perform due diligence. If you’re a big company, you may want to acquire it for its momentum and market edge; Facebook and Google do lots of acquiring for this reason, among others. You want to know how well the target company conforms to the kinds of standards, practices and procedures that your internal organization is held to. But if you’re a smart VC, you want different things from tech diligence. Does the tech give the company an “unfair advantage?” Do they approach tech with a speed-oriented, get-stuff-done attitude? The tech may be 2X better than the incumbents, but could a newcomer pretty easily get 5X? Sounds improbable, but this kind of thing happens all the time. If you’re a VC about to place a bet on an up-and-coming company, that’s the kind of tech input you should want.

  • Software Programming Language Evolution: the Structured Programming GOTO Witch Hunt

    In prior posts I’ve given an overview of the advances in programming languages, described in detail the major advances and defined just what is meant by “high” in the phrase high-level language. I've described the advances in structuring and conditional branching that brought 3-GL’s to a peak of productivity.

    The structuring and branching caught the attention of academics. Watch out! What happened next was that a theorem was proved, a movement was declared and named, and a certain indispensable part of any programming language, the GO TO statement, was declared to be a thing only used by bad programmers and should be banned. Here's the story of the nefarious GOTO.

    Structures in Programming Languages

    I've described how structures were part of the first 3-GL's and how they were soon elaborated to more clearly express the intention of programmers, making code even more productive to write. The very first FORTRAN compiler, delivered in 1957, included primitive versions of conditional branching and loops, two of the foundations of programming structure. It was so powerful that the early users figured it decreased the number of statements needed to achieve a result more than 10 times.

    These are the people who actually WRITE PROGRAMS! They want to make it easier and jumped on anything that gave a dramatic improvement.

    “Significantly, the increasing popularity of FORTRAN spurred competing computer manufacturers to provide FORTRAN compilers for their machines, so that by 1963 over 40 FORTRAN compilers existed. For these reasons, FORTRAN is considered to be the first widely used cross-platform programming language.”

    Before long, the structuring capabilities of the original IF (conditional branching) and DO (controlled looping) statements were enhanced and augmented to something close to their current form. I describe this here. The result was a peak of programmer productivity that has not substantially been increased since, and often been degraded.

    The Bohm-Jacopini Theorem

    Completely independent of the amazing advances in languages and programming productivity that were taking place, math-oriented non-programmers were hard at work deciding how software should be written. Here is the story in brief:

    The structured program theorem, also called the Böhm–Jacopini theorem,[1][2] is a result in programming language theory. It states that a class of control-flow graphs (historically called flowcharts in this context) can compute any computable function if it combines subprograms in only three specific ways (control structures). These are

    1. Executing one subprogram, and then another subprogram (sequence)
    2. Executing one of two subprograms according to the value of a boolean expression (selection)
    3. Repeatedly executing a subprogram as long as a boolean expression is true (iteration)

    The structured chart subject to these constraints may however use additional variables in the form of bits (stored in an extra integer variable in the original proof) in order to keep track of information that the original program represents by the program location. The construction was based on Böhm's programming language P′′.

    The theorem forms the basis of structured programming, a programming paradigm which eschews goto commands and exclusively uses subroutines, sequences, selection and iteration.

    This theorem got all the academic types involved with computers riled up. The key to good software has been discovered! The fact that math theorems are incomprehensible to the vast majority of people, and the fact that perfectly good computer programs can be written by people who aren't math types didn't concern any of these self-anointed geniuses.

    The important thing to note about the theorem is that it was NOT created in order to make programming easier or more productive. It just "proved" that it was "possible" to write a program under the absurd and perverse constraints of the theorem to compute any computable function. Assuming you were willing to use a weird set of bits to store location information in ways that would make any such program unreadable by any normal person. Way to go, guys — let's go back to the days of writing in all-binary machine language!

    The Crisis in Software and its solution

    Not long after this, the academic group of Computer Science “experts” formed. They had a conference. They looked at the state of software and declared it to be abysmal. The whole conference was about the "crisis" in software. See this for details.

    One of the most prominent of those Computer Scientists was Edsger W. Dijkstra. He looked at the powerful constructs for conditional branching, loops and blocks that had been added to 3-GL's and invented the term "structured programming" to describe them. He related those statements to the wonderful but useless math proof about the minimal requirements for programming a solution to any "computable function." The proof "proved" that such programs could be written without the equivalent of a GOTO statement. BTW, I do not dispute this. He wrote "the influential "Go To Statement Considered Harmful" open letter in 1968."

    Among the solutions to the software crisis they proclaimed was strict adherence to the dogma of what Dijkstra called “structured programming,” which prominently declared that the GOTO statement had no place in good programming and should be eliminated.

    Does the fact that's it is POSSIBLE to program a solution to any computable function without using GOTO mean that you SHOULD write without using GOTO's? When children go to school, it's POSSIBLE for them to crawl the whole way, without using "walking" at all. Everyone accepts that this is possible. When you're on your feet all sorts of bad things can happen — you can trip and fall! Most important, you can get the job done without walking … and therefore you SHOULD eliminate walking for kids getting to school. QED.

    This is academia for you – a prime example of how Computer Science works hard to make sure that programs are hard to write, understand and deliver, all in the name of achieving the opposite.

    The debate about structured programming

    There was no debate about the utility of the conditional branching, controlled looping and block structures that rapidly became part of any productive software language. They were there and programmers used them, then and now. The debate was about "structured programming," which by its academic definition outlawed the use of the GOTO statement. That wasn't all. It also outlawed having more than one exit to a routine, breaks from loops and other productive, transparent and generally useful constructs.

    I remember clearly as a programmer in the 1980's having a non-technical manager type coming to me and quizzing me about whether I was following the rigors of structured programming, which was then talked about as the only way to write good code. I don't remember my answer, but since I knew the manager would never go to the trouble of actually — gasp! — reading code, my answer probably didn't matter.

    The most important thing to know about the leader of the wonderful movement to purify programming is his lack of interest in actually writing code:

    Dijkstra quote

    Fortunately, there are sane people in the world, including the incomparable Donald Knuth (an academic Computer Scientist who's actually great!) and a number of others.

    An alternative viewpoint is presented in Donald Knuth's Structured Programming with go to Statements, which analyzes many common programming tasks and finds that in some of them GOTO is the optimal language construct to use.[9] In The C Programming Language, Brian Kernighan and Dennis Ritchie warn that goto is "infinitely abusable", but also suggest that it could be used for end-of-function error handlers and for multi-level breaks from loops.[10] These two patterns can be found in numerous subsequent books on C by other authors;[11][12][13][14] a 2007 introductory textbook notes that the error handling pattern is a way to work around the "lack of built-in exception handling within the C language".[11] Other programmers, including Linux Kernel designer and coder Linus Torvalds or software engineer and book author Steve McConnell, also object to Dijkstra's point of view, stating that GOTOs can be a useful language feature, improving program speed, size and code clarity, but only when used in a sensible way by a comparably sensible programmer.[15][16] According to computer science professor John Regehr, in 2013, there were about 100,000 instances of goto in the Linux kernel code.[17]

    Any programmer can make mistakes. Any statement type can be involved in that mistake. For example, I think nearly everyone accepts that cars are a good thing. But over 30,000 people a year DIE in car accidents! So where's the movement to eliminate cars because of this awful outcome! It makes as much sense as outlawing the GOTO because sometimes it's used improperly. Like every other statement type.

  • Software Programming Language Evolution: Structures, Blocks and Macros

    In prior posts I’ve given an overview of the advances in programming languages, described in detail the major advances and defined just what is meant by “high” in the phrase high-level language. In this post I’ll dive into the additional capabilities added to 3-GL’s that brought them to a peak of productivity.

    History

    Let’s remember what high-level languages are all about: productivity! They are about the amount of work it takes to write the code and how easy code is read.

    The first major advance, from machine language to assembler, was largely about eliminating the grim scut-work of taking the statements you wanted to write and making the statements “understandable” to the machine by expressing them in binary. Ugh.

    The second major advance, to 3-GL’s like FORTRAN and COBOL, was about eliminating the work of translating from your intention to the assembler statements required to express that intention. A single line of 3-GL code can easily translate into 10 or 20 lines of assembler code. And the 3-GL line of code often comes remarkably close to what you actually want to “say” to the computer, both writing it and reading it.

    FORTRAN achieved this goal to an amazing extent.

    “with the first FORTRAN compiler delivered in April 1957.[9]:75 This was the first optimizing compiler, because customers were reluctant to use a high-level programming language unless its compiler could generate code with performance approaching that of hand-coded assembly language.[16]

    “While the community was skeptical that this new method could possibly outperform hand-coding, it reduced the number of programming statements necessary to operate a machine by a factor of 20, and quickly gained acceptance.”

    The reduction in the amount of work was the crucial achievement, but just as important was the fact that each set of FORTRAN statements were understandable, in that they came remarkably close to expressing the programmer’s intent, what the programmer wanted to achieve. No scratching your head when you read the code thinking to yourself “I wonder what he’s trying to say here??” This meant easier to write, fewer errors and easier to read.

    Enhancing conditional branching and loops

    The very first iteration of FORTRAN was an amazing achievement, but it’s no surprise that it wasn’t perfect. At an individual statement level it was nearly perfect. When reading long groups of statements there were situations where the code was clear, but the intention of the programmer not clearly expressed in the code itself – it had to be inferred from the code.

    The first FORTRAN had a couple of the intention-expressing statements: a primitive IF statement and DO loop. Programmers soon realized that more could be done. The next major version was FORTRAN 66, which cleaned up and refined the early attempts at structuring. Along with the appropriate use of in-line comments, it was nearly as clear and intention-expressing as any practical programmer could want.

    The final milestone in the march to intention-expressing languages was C.

    It’s an amazing language. While FORTRAN was devised by people who wanted to do math/science calculations and COBOL by people who wanted to do business data processing, C was devised to enable computer people to write anything – above all “systems” software, like operating systems, compilers and other tools. In fact, C was used to re-write the first Unix operating system so that it could run on any machine without re-writing. C remains the language that is used to implement the vast majority of systems software to this day.

    I bring it up in this context because C added important intention-expressing elements to its language that have remained foundational to this day. It enhanced conditional statements, creating the full IF-THEN-ELSE-ENDIF that has been common ever since. This meant you could say IF <condition is true> THEN <a statement>. The statement would only be executed if the condition was true. You could tack on ELSE <another statement> ENDIF. This isn’t dramatic, but the only other way to express this common thought is with GOTO statement – which certainly can be understood, but takes some figuring. In addition, C added the ability to use a delimited block of statements wherever a single statement could be used. When there are a moderate number of statements in a block, the code is easy to read. When there would be a large number, a good programmer creates a subroutine instead.

    C added a couple more handy intention-expressing statements. A prime example is the SWITCH CASE BREAK statement. This is used when you have a number of conditions and something specific to do for each. The SWITCH defines the condition, and CASE <value> BREAK pairs define what to do for each possible <value> of the SWITCH.

    Lots of following languages have added more and more statements to handle special cases, but the cost of a more complex language is rarely balanced with the benefit in ease and readability.

    The great advance that's been ignored: Macros

    C did something about all those special cases that goes far beyond the power of adding new statements to a language, and vastly increases not just the power and readability of the language but more important the speed and accuracy of making changes. This is the macro pre-processor.

    I was already very familiar with macros when I first encountered the C language, because they form a key part of a good assembler language – to the extent that an assembler that has such a facility is usually called a macro-assembler. Macros enable you to define blocks of text including argument substitution. They resemble subroutine calls, but they are translated at compile time to code which is then compiled along with the hand-written code. A macro can do something simple like create a symbol for a constant that is widely used in a program, but which may have to be changed. When a change is needed, you just change the macro definition and – poof! – everywhere it’s used it has the new value. It can also do complex things that result in multiple lines of action statements and/or data definitions. It is the most powerful and extensible tool for expressing intention and enabling rapid, low-risk change in the programmer’s toolbox. While the C macro facility isn’t quite as powerful as the best macro-assemblers, it’s a zillion times better than not having one at all, like all the proud but pathetic modern languages that wouldn’t know a macro if it bit them in the shin.

    The next 50 years of software language advances

    The refrain of people who want to stay up to date with computers is, of course, "What's new?" Everyone knows that computers evolve more quickly than anything else in human existence, by a long shot. The first cell phones appeared not so long ago, and they were "just" cell phones. We all know that now they're astounding miniature computers complete with screens, finger and voice input, cameras and incredible storage and just about any app you can think of. So course we want to know what's new.

    The trouble is that all that blindingly-fast progress is in the hardware. Software is just along for the ride! The software languages and statements that you write are 99% the same as in the long-ago times before smart phones. Of course there are different drivers and things you have to call, just as in any hardware environment. But the substance of the software and the lines of code you use to write it are nearly the same. Here is a review of the last 50 years of "advances" in software. Bottom line: it hasn't advanced!

    Oh, an insider might argue: what about the huge advances of the object-oriented revolution? What about Google's new powerhouse, Go? Insiders may ooh and ahh, just like people at a fashion catwalk. The come-back is simple: what about programmer productivity? Claims to this effect are sometimes made, but more often its that the wonderful new language protects against stupid programmers making errors or something. There has not been so much as a single effort to measure increase of programmer productivity or quality. There is NO experimental data!! See this for more. Calling what programmers do "Computer Science" is a bad joke. It's anything but a science.

    What this means is simple: everyone knows the answer — claims about improvement would not withstand objective experiments — and therefore the whole subject is shut down. If you're looking for a decades-old example of "cancel culture," this is it. Don't ask, don't tell.

    Conclusion

    3-GL's brought software programming to an astounding level of productivity. Using them you could write code quickly. The code you wrote came pretty close to expressing what you wanted the computer to do with minimal wasted effort. Given suitable compilers, the code could run on any machine. Using a 3-GL was at least 10X more productive than what came before for many applications.

    A couple language features were incorporated into the very first languages, like conditional branching and controlled looping, that were good first steps. The next few years led early programmers to realize that a few more elaborations of conditional branching and controlled looping would handle the vast majority of practical cases. With those extra language features, code became highly expressive. All subsequent languages have incorporated these advances in some form. Sadly, the even more productive feature of macros has been abandoned, but as we'll see in future posts, their power can be harnessed to an even greater extent in the world of declarative metadata.

  • Software Components and Layers: Problems with Data

    Components and layers are supposed to make software programs easier to create and maintain. They are supposed to make programmers more productive and reduce errors. It’s all lies and propaganda! In fact, the vast majority of software architectures based on components (ranging from tiny objects to large components) and/or layers (UI, server, storage and more) lead to massive duplication and added value-destroying work. The fact that these insane methods continue to be taught in nearly all academic environments and required by nearly all mainstream software development groups is cornerstone evidence that Computer Science not only isn’t a science, it resembles a deranged religious cult.

    Components and Layers

    It has been natural for a long time to describe the different “chunks” of software that work together to deliver results for users in dimensional terms.

    First there’s the vertical dimension, the layers. Software that is “closer” to end users (real people) is thought of as being on “top,” while the closer software is to long-term data storage and farther from end users is thought of as being on the “bottom” of a software system. In software that has a user interface, the different depths are thought of as “layers,” with the user interface the top layer, the server code the middle layer and the storage code the bottom layer. Sometimes a system like this is called a “three tier” system, evolved from a two tier “client/server” system.

    Second there’s the horizontal dimension, the components. These are bodies of code that are organized by some principle to break the code up into pieces for various reasons. I've given a detailed description of components. A component may have multiple layers in it or it may operate as a single layer.

    Layers are often written in different languages and using different tools. Javascript is prominent in user interface layers, while some stored procedure language is often used in the lowest data access layer. The server-side layer may be written in any of dozens of languages, including java and python. Sometimes there are multiple layers in server-resident software, with a scripting language like PHP or javascript used for server code on the “top,” communicating with the client software.

    Components are self-contained bodies of software that communicate with other components by some kind of formal means. Formal components always have data that can only be accessed by the code in the component. The smallest kind of component is an object, as in object-oriented languages. The data members of the object can only be accessed by routines attached to the object, known as methods.  Methods can normally be called by methods of other objects. Larger components are often called microservices or services. These are usually designed so that they could run on separate machines, with calling methods that can span machines, like old RPC’s or more modern RESTful API’s. Sometimes components are designed so that instead of being directly called, they take units of work from a queue or service bus, and send results out in the same way. When a component makes a call, it calls a specific component. When a component puts work onto a queue or bus, it has no knowledge or control over which component takes the work from the queue or bus.

    Layer and components are often combined. For example, microservice components often have multiple layers, frequently including a storage layer. With a thorough object-oriented approach, there could be multiple objects inside each layer that makes up a service component.

    Why are there layers and components? Software certainly doesn’t have to use these mechanisms to be effective. In fact, major systems and tools have been built and widely deployed that mostly ignore the concepts of layers and components! See this for historic examples.

    The core arguments for layers and components typically revolve around limiting the amount of code (components) and the variety of technologies (layers) a programmer has to know about to make the job easier, along with minimizing errors and maximizing teamwork and productivity. I have analyzed these claims here and here.

    Data Definitions in Components and Layers

    All software consists of both procedures (instructions, actions) and data. Each of those needs to be defined in a program. When data is defined for a procedure to act upon, the data definitions are nearly always specific to and contained within the component and/or layer they’re part of. For the general concept and variations on how instructions and data relate to each other, see this.

    Thinking about data that’s used in a component or layer, some of the data will be truly only for use by that component or layer. But nearly any application you can imagine has data that is used by many components and layers. This necessity is painful to object/component purists who go to great lengths to avoid it. But when a piece of data like “application date” is needed, it will nearly always have to be used in multiple layers: user interface, server and database. To be used it must be defined. So it will be defined in each layer, typically a minimum of three times!

    When data is defined in software, it always has a name used by procedures to read or change it. It nearly always also has a data type, like character or integer. The way most languages define data that’s it! But there’s more.

    • When the “application date” data is shown to the user in the UI layer, it typically also needs a label, some formatting information (month, day, year), error checking (was 32 entered into the day part?) and an error message to be displayed.
    • When application date is used in the server layer by some component, some kind of error checking is often needed to protect against disaster if a mistaken value is sent by another component.
    • When a field like social security number is used in the storage layer, sanity checks are often applied to make sure, for example, that the SSN entered matches the one previously stored for the particular customer already stored in the database.
    • There need to be error codes produced if data is presented that is wrong. When the user makes an error, you can’t use a code, you have to use a readable message, which the user layer might need to look up based on the error code it gets from another component or layer.
    • Each language has its own standards for defining data and the attributes associated with it. Someone has to make sure that all the definitions in varying languages match up, and that when changes are made they are made correctly everywhere.
    • If the data is sent from one component to another, more definitions have to be made: the procedure that gets the data, puts it into a message and sends it, possibly on a queue; the procedure that gets the message, maybe from a queue and sends the data to the routine that will actually do the processing.
    • When data is sent between components, various forms of error checking and error return processing must also be implemented for the data to protect against the problems caused by bad data being passed between components that, for example, might have been implemented and maintained by separate groups. Sometimes this is formalized into "contracts" between data-interchanging components/layers.

    So what did we gain by breaking everything up into components and layers? A multiplication of redundant data definitions containing different information, expressed in different ways! What we “gained” by all those components and layers was a profusion of data definitions! The profusion is multiplied by the need for components and layers to pass data among themselves for processing.The profusion can’t be avoided and can only be reduced by introducing further complexity and overhead into the component and layer definitions.

    See this for another take on this subject.

    I’ve heard the argument that unifying data definitions makes things harder for the specialists that often dominate software organizations. The database specialists are guardians of the database, and make sure everything about it is handled in the right way. The user interface specialists keep the database specialists away from their protected domain, because if they meddled the users wouldn’t enjoy the high quality interfaces they’ve come to expect. There is no doubt you want people to know their stuff. But none of this is really that hard – channeling programmers into narrow specialties is one of the many things that leads to dysfunction. Programmers produce the best results by thoroughly understanding their partners and consumers, which can only be done by spending time working in different roles – for example spending time in sales, customer service and different sub-departments of the programming group.

    Data Access in Components and Layers

    Now we’ve got data defined in many components and layers. In a truly simple system, data would be defined exactly once, be read into memory and be accessed in the single location in which it resides by whatever code needs it for any reason. If the code needs to interact with the user, perform calculations or store it, the code would simply reference the piece of data in its globally accessible named memory location and have at it. Fast and simple.

    If this concept sounds familiar, you may have heard of it in the world of relational DBMS’s. It’s the bedrock concept of having a “normalized” schema definition, in which each unique piece of data is stored in exactly one place. A database that isn’t normalized is asking for trouble, just like the way that customer name and address are frequently stored in different places in various pieces of enterprise software that evolved over time or were jammed together by corporate mergers.

    Components and layers performing operations on global data is neither fast nor simple. Suppose for example that you’ve got a new financial transaction that you want to process against a customer’s account. In an object system, the customer account and financial transaction would be different objects. That’s OK, except that the customer account master probably has a field that is the sum of all the transactions that have taken place in the recent time period.

    In a sensible system, adding the transaction amount to a customer transaction total in the customer record would probably be a single statement that referenced each piece of data (the transaction amount and the transactions total) directly by name. Simple, fast and understandable.

    In a component/object system that’s properly separated, transaction processing might be handled in one component and account master maintenance in another. In that case, highly expensive and non-obvious remote component references would have to be made, or a copy of the transaction placed on an external queue. In an object system, a method of the transaction object would have to be called and then a method of the account master object.

    It’s a good thing we’ve got smart, educated Computer Scientists arranging things so that programmers do things the right way and don’t make mistakes, isn’t it?

    Conclusion

    With the minor exception of temporary local variables, nearly every layer or component you break a body of software into leads to a multiplication of redundant, partly overlapping data definitions expressed in different languages — each of which has to be 100% in agreement to avoid error. The communication by call or message between layers and components to send and receive data increases the multiplication of data definitions and references. Adding the discipline of error checking is a further multiplication.

    Not only does each layer and component multiply the work and the chances of error, each simple-seeming change to a data definition results in a nightmare of redundancy. You have to find each and every place the data is defined, used or error-checked and make the right change in the right language. Components and layers are the enemy of Occamality. Software spends the VAST majority of its life being changed! Increasingly scattered data definitions make the thing that is done 99% of the time to software vastly harder and riskier to do.

  • The catastrophic effect of bad metaphors in software

    Software is invisible. The vast majority of people who talk about it have little or no practical experience writing it. But software is incredibly important and people need to be able to talk about it: creating it, modifying it, installing and using it and making it work as intended. How do you talk about something that's invisible? Answer: you use real-world metaphors that everyone knows and talk in those terms. Given the importance of the process of creating and modifying software, you also use real-world metaphors for the processes, methods and techniques that people use for those things on software.

    Given the high cost, long time periods and risky outcomes of software development, real-world metaphors are used as models for how to create what seem like the equivalent things in the invisible world. Software professionals use the same metaphors and think in the same terms as their non-programming counterparts, the more so as advancement in software normally correlates with doing less hands-on programming.

    Before long, the metaphors turn into highly detailed, structured things embodied in software tools for software management, and regulations for how software must be built in order to be acceptable for acquisition by governments and major enterprises. The software in medical devices, for example, is highly regulated in this way, everything from requirements to testing and modifications.

    These metaphors are great for communication among people involved in software in some way, but disastrous for building fast, efficient software, quickly and effectively. The metaphors are largely to blame for the rolling crisis/disaster of software that was first recognized over 50 years ago.

    Small groups of smart, hard-working software nerds sometimes break out of metaphor prison and use techniques that naturally grow out of the experience of programmers who live in the land of software, which, though invisible to the vast majority of people, is vividly visible to them. They are usually simple ideas that come out of the writing real-world software that meets needs and evolves with experience. Top programmers often think of themselves as "lazy," meaning that they are aware of how they spend their time and find ways to avoid doing things that don't advance the cause. They visualize the world of software they're working on, always looking for ways to spend less time writing less code to get more stuff done without errors.

    The methods that they use rarely conform to any of the established software methodologies and typically make non-software people uncomfortable. I've observed and described many of those techniques, which as a whole I term "wartime software."

    In this post I will identify and summarize some of the widely accepted metaphors that continue to wreak havoc on software development

    The factory metaphor

    The metaphor of the software factory tends to float around. Wouldn’t it be nice if we could churn out software as reliably and predictably as a factory? We’d put the requirements and other goals and constraints into the factory, and after a series of careful, orderly processing steps, out would come a wonderful piece of software! Unfortunately, factories make copies of things that are already designed and tested. All the effort goes in to making the design of a new car or drug; sometimes the design is no good and has to be discarded. Once you’ve made test versions and made sure they operate the way you planned, you turn the design over to the factory for making lots of identical copies of it. See this for more.

    The building metaphor

    Buildings and roads are widely used as metaphors for software. Lots of the words are the same: we have requirements, architects, standards that must be met, common design patterns. We have project plans with resource requirements and target dates. We might put the project out to bid. We use specialists for each aspect of the work. We build in phases and have regular reviews by the owners and quality inspectors. At the end we have walk-throughs, which may result in punch lists of defects to be corrected. Then we have the closing and we’re off and running!

    It’s a powerful metaphor. It’s better than the factory metaphor because each house is uniquely built by a team, just like each piece of software is uniquely built by a team. The trouble is that a house just sits there. It’s a passive physical object. Software is NOT a physical object; it’s mostly is a set of actions to be taken by a computer, along with some built-in data. See this for more detail.

    There's another issue with the metaphor. Buildings are built and then mostly used. Modifications are sometimes made, but they are normally small compared to the effort put into the original building. Software, on the other hand, is mostly changed. Vastly more effort is typically put into changing a body of software over time than was spent with the original building. Most of what most employees of software departments do is make changes. See this for more detail.

    The Architect metaphor

    The architect metaphor is closely related to the building metaphor. It's just as bad for the same reasons. Here is a detailed description that uses architecture software to illustrate the difference.

    Waterfall and Agile

    The waterfall method is the universally derided classic method of applying project management to building software. After years of failure with increasingly rigorous and detailed project planning to improve software development outcomes, it was supposed that waterfall just wasn't flexible enough. It was too rigid. More agility to respond to emerging knowledge and conditions would solve the problems. Thus, Agile software development method was invented, purposely built on the metaphor of a nimble-footed engineering team going through the process with lots of re-calibration and obstacle avoidance.

    Hah. Nothing has changed except the rhetoric. Waterfall is a method that fixes the requirements and figures out the time to build them. Agile is a method that fixes the time periods and figures out what can be built in each. The difference in practice is that Agile has more overhead, and no better results. See this for more.

    Objects in software

    Could there be a more inappropriate metaphor? A bundle of data definitions and code is an "object?" A thing? Sorry, programs are NOT things. They're not passive; programs are actions!

    Object-orientation started as an obscure design pattern for creating simulation programs, grew into a fad and has evolved into a standard part of software orthodoxy. I remember long ago hearing the sales pitch for the early O-O languages like Smalltalk. The business sales pitch was that it would greatly enhance the speed and quality of software development, while enabling changes to be more easily accomplished. The tech pitch was that by creating "objects" consisting of a definition of a related set of data and by including in that definition all the code that would read and write the data, concerns would be isolated and mistakes avoided by having data manipulated only by "its own" code.

    The common metaphor was that objects were like Lego blocks, lending themselves to easily build elaborate structures and making changes to them without collateral damage. By throwing in things like inheritance — which was always illustrated by physical things like the hierarchy of species, families, etc. — the Lego effect was even stronger.

    As it turned out, O-O has been a giant step backwards for software. While appropriate in some situations, as a rigid structure with no exceptions it imposes difficult and unnatural constraints on how code and data can be related and arranged, making everything more complicated than it needs to be. Lego blocks are great as physical objects, but their software version is a failure.

    The walls and defenses metaphor

    This one is huge in cybersecurity. Everyone talks about putting up walls to protect our computers, establishing and maintaining an effective periphery. We talk about attackers who want to "penetrate" our security defenses.

    A great deal of software security is built according to this metaphor. The fact that software is invisible to everyone who makes important decisions about computer security leads directly to the string of monstrous security disasters that continue to this day. Here is a good way using baseball to understand the impact of the invisibility. Here is a detailed description of the impact of invisibility.

    Conclusion

    We all think using metaphors. It's not inherently a bad thing. But when the metaphor has a strong real-world grounding that everyone understands, it can lead to disaster when applied to software in ways that don't match the correct understanding of the software itself. This is unfortunately the case for a great deal of software methods and processes, and helps explain the ongoing software disaster that was first recognized over 50 years ago.

     

  • The Dangerous Drive Towards the Goal of Software Components

    Starting fairly early in the years of building software programs, some programs grew large. People found the size to be unwieldy. They looked for ways to organize the software and break it up into pieces to make it more manageable. This effort applied to the lines of code in the program and to the definitions of the data referred to by the code, and to how they were related. How do you handle blocks of data that many routines work with? How about ones whose use is very limited? The background of this issue is explained here.

    These questions led to a variety of efforts that continue to this day to break a big body of software code and data definitions into pieces. The obvious approach, which continues to be used today, was simply to use files in directories. But for many people, this simple, practical approach wasn't enough. They wanted to make the walls between subsets of the code and data higher, broader and more rigid, creating what were generally called components. The various efforts to create components vary greatly. They usually create a terminology of their own, terms like “services” and “objects.” Such specialized, formulaic approaches to components are claimed to make software easier to organize, write, modify and debug. In this post I’ll take a stab at explaining the general concept of software components.

    Components and kitchens

    When you’re young and in your first apartment, you’ve got a kitchen that starts out empty. You’re not flush with cash, so you buy the minimum amount of cooking tools (pots, knives, etc.) and food that you need to get by. You may learn to make dishes that need additional tools, and you may move into a house with a larger kitchen that starts getting filled with the tools and ingredients you need to make your growing repertoire of dishes. You may have been haphazard about your things at the start, but as your collection grows you probably start to organize things. You may put the flour, sugar and rice on the same shelf, and probably put all the pots together. You may put ingredients that are frequently used together in the same place. You probably store all the spices and herbs together. It makes sense – if everything were scattered, you’d have a tough time remembering where each item was. The same logic applies to a home work bench and to clothes.

    The same logic applies for the same reason to software! It’s a natural tendency to want to organize things in a way that makes sense – for remembering where they are, and for convenience of access. This natural urge was recognized in the early days of software programming. Given that there aren’t shelves or drawers in that invisible world, most people settled on the term “component” as the word for a related body of software. The idea was always that instead of one giant disorganized block of code, the thing would be divided into a set of components, each of which had software and data that was related.

    A software program consists of a set of procedures (the actions) and a set of data definitions (what the procedures act on). Breaking up a large amount of code into related blocks was helped by the early emergence of the subroutine – a block of code that is called on to do something and then returns with a result; kind of like a mixer in a kitchen. The problems that emerged were how to break a large number of subroutines into components (each of which had multiple subroutines), and how to relate the various data definitions to the routine components. This problem resembles the one in the kitchen of organizing tools and ingredients. Do you keep all the tools separate from the ingredients, or do you store ingredients with the tools that are used to process them?

    In the world of cooking, this has a simple answer. If all you do is make pancakes, you might store the flour and milk together with the mixing bowls and frying pans you use with them. But no one ever just makes pancakes – duh! And even if you did, you’d better put the milk and butter in the fridge! Ditto with spices. Nearly everyone has a spice drawer or shelf, and spices used for everything from baking to making curries are stored there.  Similarly, you store all the pans together, all the knives, etc.

    In software it’s a little tougher, but not a lot. One tough subject is always do you group routines (and data) together by subject/business area or by technology area? The same choice applies to organizing programmers in a department. It makes sense for everyone who deals primarily with user interfaces to work together so they can create uniform results with minimal code. Same thing with back-end or database processing. But over time, the programmers become less responsive to the business and end users; the problem is often solved by having everyone who contributes to a business or kind of user work as a team. Responsiveness skyrockets! Before long, things begin to deteriorate on the tech side, with redundant, inconsistent data put into the database, UI elements that varying between subject areas, confusing users, etc. There’s pressure to go back to tech-centric organization. And so it goes, round and round. Answer: there is no perfect way to organize a group of programmers!. Just as painfully, there is no perfect way to organize a group of procedures and their relationship to the data they work on!!

    The drive towards components

    There are two major dimensions of making software into components. One dimension is putting routines into groups that are separate from each other. The second dimension is controlling which routines can access which data definitions.

    Separating routines into groups can be done in a light-weight way based on convenience, similar to having different workspaces in a kitchen. In most applications there are routines that are pretty isolated from the rest and others that are more widely accessed. Enabling programmers to access any routine at any time and having access to all the source code makes things efficient.

    Component-makers decided that programmers couldn't be trusted with such broad powers. They invented restrictions to keep routines strictly separate. While there were earlier versions of this idea, a couple decades ago "services" were created to hold strictly separate groups of routines. An evolved version of that concept is "micro-services." See this for an analysis.

    The second major dimension of components is controlling and limiting the relationship between routines and data. The first step was taken very early. It was sensible and remains in near-universal use today: local variables. These are data definitions declared inside a routine for the private use of that routine during its operation. They are like items on a temporary worksheet, discarded when the results are created.

    Later steps concerning "global" variables were less innocent. The idea was to strictly separate which routines can access which data definitions. Early implementations of this were light-weight and easily changed. Later versions built the separation into the architecture and code details. For example, each micro-service is ideally supposed to have its own private database and schema, inaccessible to the other services. This impacts how the code is designed and written, increasing the total amount of code, overhead and elapsed time.

    Languages that are "object oriented" are an extreme version of the component idea. In O-O languages, each data definition ("class") can be accessed only by an associated set of routines ("methods"). This bizarre reversal of the natural relationship between code and data results in a wide variety of problems.

    These ideas of components can be combined, making the overhead and restrictions even worse. People who build micro-services, for example, are likely to use O-O languages and methods to do the work. Obstacles on top of problems.

    Components in the kitchen

    All this software terminology can sound abstract, but the meaning and consequences can be understood using the kitchen comparison.

    What components amount to is having the kitchen be broken up into separate little kitchenettes, each surrounded by windowless walls. There are two openings in each kitchenette's walls, one for inputs and one for outputs. No chef can see or hear any other chef. The only interaction permitted is by sending and receiving packages. Each package has an address of some kind, along with things the sending chef wants the receiving chef to work on. For example, the sending chef may mix together some dry ingredients and send them to another chef who adds liquid and then mixes or kneads the dough as requested. The mixing chef might then put the dough into another package and return it to the sending chef, who might put together another package and send it to the baking chef who controls the oven.

    If the components are built as services, the little kitchenettes are built as free-standing buildings. In one version of components (direct RESTful calls), there is a messenger who stands waiting at each chef's output window. When the messenger receives a package, the messenger reads the address, jumps into his delivery van with the package and drives off to the local depot and drops off the package; another driver grabs the package, loads it into his van and delivers it to the receiver's input window. If the kitchenettes are all in the same location the vans and depot are still used — the idea is that a genius administrator can change the location of the kitchenettes at any time and things will remain the same.

    Another version of components is based around an Enterprise Service Bus (ESB). This is similar to the vans and the central office depot except that the packages are all queued up at the central location, which has lots of little storage areas. Instead of a package going right to a recipient, it's sent to one of these little central storage areas. Then, when the chef in a kitchenette is ready for more work he sends a requests to the central office, asking for the next package from a given little storage area. Then a worker grabs the oldest package, gives it to a driver who puts it in his van and delivers the package to the input window of the requesting chef.

    If this sounds bizarre and lots of extra work, it's because … it's bizarre and requires lots of extra work.

    The ideal way to organize software

    The ideal way to break software into components is actually pretty similar to the way good kitchens are organized. Generally speaking, you start by having the same kinds of things together. You probably store all the pots and pans together, sorted in a way that makes sense depending on how you use them. You probably pick a place that’s near the stove where they’ll probably be used – maybe even hanging on hooks from the ceiling. You probably have a main store of widely used ingredients like salt, but you may periodically put containers of it near where it is most often used. An important principle is that most chefs work at or near their stations – but (it goes without saying) can move anywhere to get anything they need. There aren't walls stopping you, and you don’t bother someone else to get something for you when you can more easily do it yourself.

    Exactly the same principle applies whether you are creating an appetizer, an entree, a side dish or a dessert — you do the same gathering and assembly from the ingredients in the kitchen, but deliver the results on different plates at different times.

    In software this means that while routines may be stored in files in different directories for convenience, and that usually your work is confined to a single directory, you go wherever you need to in order to do your job. Same thing with data definitions; you can break them up if it seems to make things more organized, but any routine can access any data definition it needs to. When you’re done, you make a build of the system and try it out. That's what champion/challenger QA is for.

    Are you deploying on a system that has separate UI, server and storage? No problem! That's what build scripts (which you would have anyway) are for! This approach makes it easier to migrate from the decades-old DBMS-centric systems design and move towards a document-centered database with auto-replication to a DBMS for reporting, which typically improves both performance and simplicity by many whole-number factors.

    Are the chefs in your kitchen having trouble handling all the work in a timely way? In the software world you might think of making a "scalable architecture" with separate little kitchenettes, delivery vans, etc. — which any sensible person knows adds trouble and work and makes every request take longer from receipt to ultimate delivery. In the world of kitchens (and sensible software people) you might add another mixer or two, install a couple more ovens and hire another chef or two, everyone communicating and working in parallel, and churning out more excellent food quickly, with low overhead.

    If you think this sounds simple, you’re right. If you think it sounds simplistic, perhaps you should think about this: any of the artificial, elaborate, rigid prescriptions for organizing software necessarily involves lots of overhead for people and computers in designing, writing, deploying, testing and running software. Each restriction is like telling a chef that under no circumstances is he allowed to access this shelf of ingredients or use that tool when the need arises.

    Arguments to the contrary are never backed with facts or real-world experiments – just theory and religious fervor. Instead, you should consider the effective methods of optimizing a body of software, which are centered on Occamality (elimination of redundancy) and increasing abstraction (increasingly using metadata instead of imperative code). Both of these things will directly address the evils that elaborate component methods are supposed to cure but don't.

  • Fundamental Concept: Software is not a passive thing, it is actions

    Even lay people have heard that there are computer languages. They've heard that the languages process data — they take data in and put it out. They gather there are commands in the language to do stuff to data. So of course software is actions.

    In spite of this knowledge, something very different is in the forefront of peoples' thinking about software, both in the industry and not. This different way of thinking results from the fact that software is invisible to nearly everyone, and that in order to think and communicate about it, language is used that assumes software is a thing.

    We talk about a body of software, the 2021 version of a piece of software, as though it were a car. Speaking of cars, we talk about how software crashes. We talk about how software has bugs, a phrase that originated from a bug found in a long-ago computer that had broken.

    We talk about creating requirements for a new piece of software that someone will architect and the building may be put out to bid. The rest of the software building process sounds remarkably like building a house. We think of a time when the building project is done and we can start using the house.

    The trouble is that a house just sits there. It’s a passive physical object. Software is NOT a physical object; it’s mostly is a set of actions to be taken by a computer, along with some built-in data. A piece of software is like a data factory – it takes data in, makes all sorts of changes, sometimes puts some of it in a storage room, and spits out data in all sorts of volumes and formats. A software program responds to each piece of data it gets, e.g. mouse click or data file; software is an actor, while a house is a thing that does nothing. Designing a piece of software is more like writing a play for an improv theater – instead of having a fixed script, the play can go in all sorts of directions depending on the input it gets from the audience. The improv theater script even has to tell the actor that when an audience member uses a swear word, the actor has to respond saying that those words aren’t permitted, would he like to try again. This is NOTHING like designing a house! Here is more detail.

    Suppose you use an advanced architecture program and design a house in great detail. Modern programs go down to the level of the framing, the details of the steps, doors and windows, literally every object that will be part of the eventual house. When you're done with such a design, you can give it to the building department for approval and to a construction company to build. If they follow the instructions correctly, you'll get exactly what you saw in the 3-D renderings and the live-action walk-throughs. Here's the killer: the equivalent of such a detailed design for computer software is … a computer program! Or at least enough detail in pseudo-code, data definitions and screen designs so that coding is fairly mechanical. A detailed design for a computer program is like the results of extensive interactions by the script-writing team for a new episode of a series TV show, with recordings and lots of notes. Chances are a junior script-writer who has been intensely following everything with some participation is assigned the job of producing a clean draft of the script, ready for refinement and editing. That is the program, ready for actors to give it a studio reading, like you would run a test version of the software for the developers.

    Software isn't a thing, regardless of how often we talk about it as though it were. It isn't a thing, even though we nearly always use methods like project management developed for building things for organizing its creation. Software is a collection of actions, like an incredibly detailed script for improvisational theater with loads of conditional branching and the ability to bring things up that were talked about before in creative and useful ways.

    This matters because how you think influences what you do. If you're thinking about running a marathon while you're climbing a mountain, chances are you'll slip and fall. If you're thinking about baseball plays while you're trying to choreograph a ballet, I suspect you won't win any awards. But at least those are thinking about actions. If you're thinking about the design of a book while you're writing a comedy, I doubt you'll get many laughs.

    This fundamental concept is a core aspect of all of programming. The fact that people so often have the wrong things in their heads while they are dealing with software has profound effects.

    And … just as bad, we mostly don't "build" software — mostly we change it!

  • The Relationship between Data and Instructions in Software

    The relationship between data and instructions is one of those bedrock concepts in software that is somehow never explicitly stated or discussed. While every computer program has instructions (organized as routines or subroutines) and data, the details of how the data is identified, named and accessed vary among programming languages and software architectures. Those differences of detail have major consequences. That’s why understanding the underlying principles is so important.

    Programmers argue passionately about the supposed virtues or defects of various  software architectural approaches and languages, but do so largely without reference to the underlying concepts. Only by understanding the basic concepts of data/instruction relationships can you understand the consequences of the differences.

    The professional kitchen

    One way to understand the relation between instructions (actions) and data is to compare it to something we can all visualize. An appropriate comparison with a program is a professional chef’s kitchen with working cooks. But in this kitchen, the cooks are a bit odd — only one of them is active at a time. When someone asks them to do something they get active, each performing its own specialty. A cook may ask another cook for help, giving the cook things or directing them to places, and getting the results back. The cooks are like subroutines in that they get called on to do things, often with specific instructions like “medium rare.” The cook processes the “call” by moving around the kitchen to fetch ingredients and tools, brings them to a work area to process the ingredients (data) with the tools, and then delivers the results for further work or to the server who put in the order. The ingredients (data) can be in long-term storage, working storage available to multiple chefs or in a workspace undergoing prep or cooking. In addition, food (data) is passed to chefs and returned by them.

    The action starts when a server gives an order to the kitchen for processing.

    • This is like calling a program or a subroutine. Subroutines take data as calling parameters, which are like the items from the menu written on the order to the kitchen.

    The person who receives the order breaks the work into pieces, giving the pieces to different specialists, each of whom does his work and returns the results. Unlike in a kitchen, only one cook is active at a time. One order might go to the meat chef and another to whoever handles vegetables.

    • This is like the first subroutine calling other subroutines, giving each one the specifics of the data it is supposed to process. The meat subroutine would be told the kind and cut of meat, the finish, etc.

    In a professional kitchen there is lots of pre-processing done before any order is taken. Chefs go to storage areas and bring back ingredients to their work areas. They may prepare sauces or dough so that everything is mixed in and prepped so that it can be finished in a short amount of time. They put the results of their work in nearby shelves or buckets for easy access later in the shift.

    • This is like getting data from storage, processing it and putting the results in what is called static or working storage, which is accessible by many different subroutines.

    There is a storage area and refrigerator that stores meat and another that stores vegetables. The vegetable area might have shelves and bins. The cook goes to the storage area and brings the required ingredients back to the cook’s work space. Depending on the recipe, the cook may also fetch some of the partly prepared things like sauces, often prepared by others, to include.

    • This is like getting data from long-term storage and from working storage and bringing it to automatic or local variables just for this piece of work.

    The storage area could be nearby. It could be a closet with shelves containing big boxes that have jars and containers in it. A cook is in charge of keeping the pantry full. They go off and get needed ingredients and put them in the appropriate storage area as needed. They could also deliver them as requested right to a chef.

    • This is like having long-term storage and access to it completely integrated with the language, or having it be a separate service that needs to be called in a special way.

    The chef does the work on the ingredients to prepare the result.

    • This is like performing manipulations on the data that is in local variables until the desired result has been produced. In the course of this, a chef may need to reach out and grab some ingredient from a nearby shelf.

    The chef may need extra space for a large project. He grabs some empty shelves from the storage area and uses them to store things that are in progress, like dough that needs time to rise. later a chef might call out “grab me the next piece of dough” or “I need the dough on the right end of the third shelf.

    • This is like taking empty space and using it. Pointers are sometimes used to reference the data, or object ID’s in O-O systems.

    The cook delivers the result for plating and delivery.

    • This is like producing a return variable. It may also involve writing data to long-term storage or working storage.

    I’m not a cooking professional, but I gather that the work in professional kitchens and how they’re organized has evolved towards producing the best results in the least amount of elapsed time and total effort. As much prep work as done before orders are received to minimize the time and work to deliver orders quickly and well. The chefs have organized work and storage spaces to handle original ingredients (meat, spices, flour, etc.) and partly done results (for example, a restaurant can’t wait the 45 minutes it might take to cook brown rice from scratch).

    In the next section, this is all described again somewhat more technically. If you’re interested in technology or have a programming background, by all means read it. The main points of this post and the ones that follow can be understood without it.

    Instructions and data in a computer program

    The essence of a computer program is instructions that the computer executes. Most of the instructions reference data in some way – getting data, manipulating it, storing results. See this for more.

    In math, from algebra on up, variables simply appear in equations. In computer software, every variable that appears in a statement must be defined as part of the program. For example, a simple statement like

              X = Y+1

    Means “read the value stored in the location whose name is Y, add the number 1 to it, and store the result in the location whose name is X.” Given this meaning, X and Y need to be defined. How and where does this happen? There are several main options:

    • Parameters. These form part of the definition of a subroutine. When calling a subroutine, you include the variables you want the subroutine to process. These are each named.
    • Return value. In many languages, a called routine can return a value, which is defined as part of the subroutine.
    • Automatic or local variables. These are normally defined at the start of a subroutine definition. They are created when the subroutine starts, used by statements of the subroutine and discarded when the subroutine exits.
    • Static or working storage variables. These are normally defined separately (outside of) subroutines. They are assigned storage at the start of the whole program (which may have many subroutines), and discarded at the end.
    • Allocated variables. Memory for these is allocated by a subroutine call in the course of executing a program. Many instances of such allocated variables may be created, each distinguished by an ID or memory pointer.
    • File, database or persisting variables. These are variables that exist independent of any program. They are typically stored in a file system or DBMS. Some software languages support these definitions being included as part of a program, while others do not. See this for more.

    There are a couple concepts that apply to many of the places and ways variables can be defined.

    • Grouping. Groups of variables can be in an ordered list, sometimes with a nesting hierarchy. This is like the classic Hollerith card: you would have a high-level definition for the whole card and then a list of the variables that would appear on the card.
      • There might be subgroups; for example, start-date could be the name of a group consisting of the variables day, month, year.
      • Referring to such a variable might look like “year IN start-date IN cust-record” in COBOL, while in other languages it might be year.start-date.cust-record.
    • Multiples. Any variable or group can be declared to be an array, for example the variable DAY could be made an array of 365, so there’s one value per day of a year.
    • Types or templates. Many languages let you define a template or type for an individual variable or group. When you define a new variable with a new name like Y, you could say it’s a variable of type X, which then uses the attributes of X to define Y.
    • Definition scope. Parameters, return values and local variables are always tied to the subroutine of which they are a part. They are “invisible” outside the subroutine. The other variables, depending on the language, may be made “visible” to some or all of a program’s subroutines. Exactly how widely visible data definitions are is the subject of huge dispute, and is at the core of things like components, services and layers.

    When you look at a statement like X = Y+1, exactly how and where X and Y are defined isn’t mentioned. X could be a parameter, a local variable or defined outside of the subroutine in which the statement appears. Part of the job of the programmer is to name and organize the data definitions in a clean and sensible way.

    The variety of data-instruction organization and relationships

    Most of the possibilities for defining variables listed above were provided for by the early languages FORTRAN, COBOL and C, each of which remains in widespread use. Not long after these languages were established, variations were introduced. Software languages and architectures were created that selected and arranged the way instructions related to data definition. Programmers and academics decided that some ways of referencing and organizing data were error-prone and introduced restrictions that were intended to reduce the number of errors that programmers made when creating programs. In software architecture, the idea arose that all of a program's subroutines should be organized into separate groups, usually called "components" or "services," each with its own collection of data definitions. The different components call on each other or send messages to ask for help and get results, but can only directly operate on data that is defined as part of the component.

    The most extreme variation of instruction/data relationship is a complete reversal of point of view. The view I've described here is "procedural," which means everything is centered around the actor, the chef who does things. The reversal of that point of view is "object-oriented," called O-O, which organizes everything around the data, the acted upon, the ingredients and workspaces in a kitchen. Instead of following the chef around as he gets and operates on ingredients (data), we look at the data, called objects, each of which has little actors assigned to it, mini-chefs, that can send messages for help, but can only work on their own little part of the world. It's hard to imagine!

    The basic idea is simple: instead of having a master chef or ones with broad specialties like desserts, there are a host of mini-chefs called "methods," each of  which can only work on a specific small group of ingredients. A master chef has to know so much — he might make a mistake! By having a mini-chef who is 100% dedicated to dough, and never letting anyone else create the dough, we can protect against bad chefs (programmers) and make sure the dough is always perfect! Hooray! Or at least that's the theory…

    Conclusion

    Computers take data in, process it and write data out. Inside the computer there are instructions and data. Software languages have evolved to make it easier for programmers to define the data that is read and created and to make it easier to write the lines of code that refer to the data and manipulate it. As bodies of software have grown, people have created ways to organize the data that a computer works on, for example putting definitions for the in-process data of a subroutine inside the subroutine itself or collecting a group of related subroutines into a self-contained group or component with data that only it can work on.

    Understanding the basic concepts of instruction/data relationships and how those relationships can be organized and controlled is the key to understanding the plethora of approaches to language and architecture that have been created, and making informed decisions about which language and architecture is best for a given problem. The overall trend is clear: Programming self-declared elites decide that this or that restriction should be placed on which variables can be accessed by which instructions in which way, with the goal of reducing the errors made by normal programming riff-raff. Nearly all such restrictions make things worse!

  • Fundamental Concept: Software is rarely Built, mostly it is Changed

    Nearly everything we hear about the construction of software is that it is built. This is a fundamental misconception that pervades academia and practice. This misconception is a major underlying cause of why software has so many problems getting created in a timely way and functioning with acceptable speed and quality.

    The software-is-built misconception is woven into every aspect of software education and practice. You are first taught how to write software programs; all the exercises start with a blank screen or sheet of paper. You’re taught how to test and run them. In software engineering you’re taught how to perform project management, from requirements to deployment. You may learn how to automate software quality testing. Everything about these processes assumes that software is built, like a house or a car.

    The idea that software is built isn’t wrong – it just applies to a tiny fraction of the total effort that goes in to a body of software from birth to its final resting place in the great bit-bucket in the sky. The vast majority of the effort goes into changing software – making it do something different than the version you’re working on does. Maybe you’re fixing a bug, maybe you’re adding a feature or adapting to a new environment. Sometimes the changes are extensive, like when you’re changing to a new UI framework. For a typical  body of software, however much effort is put into the first production version, a minimum of 10X and often over 100X of the work goes into changing it.

    We don't think of software as ever-changing, ever-growing partly because we think in terms of physical metaphors. We use the same project management techniques that are used for houses that we use for software. While sometimes houses are changed, mostly they're just built and then used for years. It's an extremely rare house that is endlessly extended the way software is. See this for more.

    When it comes to enterprise software, the fraction of effort that goes into change sky rockets even more. A good example is the kind of software that runs hospitals and health systems. The software already exists, and it’s already had 100X the effort put into change than the original building. For example, standard software from the leading industry supplier was installed in the NYC hospital system. The original contract — just to buy and install existing production software — was for over $300M and 4 years! A couple years later the carefully planned project cost more than doubled. All because of changes. Here are details.

    Why does thinking software is built rather than changed matter?

    The idea that software is about building and not changing is responsible for most of the processes, procedures and methods that are overwhelmingly common practice either by custom or by regulation. The mismatch between the practices and the fact that software is nearly always changed instead of being created in the first place results in the never-ending nightmare of software disasters that plague us. The never-ending flow of new tools, languages and methods are futile attempts to solve these problems. No one would find the new things attractive if there weren’t a problem to fixed – a problem everyone knows and that no one wants to explicitly discuss!

    Example: Project Management, Architecture, Design and Coding

    Mostly what happens when you get a job in software is that you join a group that works with a substantial existing body of code. You may be a project manager, a designer, a customizer, any number of things, all of which revolve around the body of code.

    Yes, you have to start any project from an understanding of what you want to accomplish. Here's the thing: what you want to get done is in the context of the existing code. What this means is that you have to understand as much of the code as possible to do things. If what you want to do is fairly circumscribed in the existing code, you may get away with ignoring the rest — which is where comprehensive, nothing-left-out regression testing comes in, which means better make darn sure you didn't break anything.

    In this context, lots of skills are relevant. But the overwhelming number one thing is … knowledge of the ins and outs of the existing code base!

    I've worked in a couple places where there were large code bases of this kind. Forget about departmental hierarchy — the go-to person in every case of doing anything important was the person who best knew that particular aspect or section of the code.

    In this context, what's the point of all the fancy project management stuff people think is the cornerstone of effective, predictable work? When you're traveling through an unknown land with lots of mountains, jungles, rivers and swamps, what do you want above all else (besides nice paved roads that don't exist)? You want an experienced GUIDE, someone who's been all over the place and knows the best way to get there from here.

    Kind of turns all the project management stuff on its head, doesn't it?

    Example: Software QA

    Most software QA revolves around writing test scripts of one kind or another that mirror the new code you write: if the code does X, the test should assure that the code does X correctly. This is true whether you're doing test-driven development or whether QA people write scripts. As you build the code, you're supposed to build the test code. When you make a new release, you run integration tests that include running all the individual test scripts.

    Everyone with experience knows the problems: there are never enough tests, they don't cover all the conditions, there are omissions and errors in the tests, with the result that new releases put into production often have problems.

    The root cause of this is the never-spoken assumption that software is about building, when the reality is that it's mostly changed. What this means is that the focus should be on the existing code and what it does — do the changes that you made break anything that used to work? When you make changes, you should be thinking champion/challenger like in data science: the new code is the challenger: did it make things better, and did it (probably unintentionally) make anything worse? The best way to do this is to run the same data through the production program and the challenger program in parallel, and have a tool that compares the output. Looking at the output will tell you what you want to know. Even better, running the new code in live-parallel-production, with lots of real customer data going through both versions but only the current champion output going to the customer provides a safe way to see if the customers are OK, while testing out the new functionality on the champion code only to make sure it does what you wanted.

    Who needs test scripts, TDD and the rest? They're backwards! See this for more.

    Conclusion

    Software is literally invisible to the vast majority of people. Therefore, we all talk and think about software in terms of metaphors. That would be OK if the metaphors matched reality; unfortunately, the metaphors don’t match reality, and play a major role in software dysfunction. In real life, we build houses, bridges, and other structures — we mostly don't change them.

    So what are the methods that are appropriate when you recognize that the main thing you do with software is change it instead of building it? I've done my best to abstract the methods I've seen the best people use, calling them (for want of a better term) war-time software. Instead of creating a whole adult human at once, like Frankenstein, it's like starting with a baby that can't do much yet, but is alive. The software effort is one of growing the baby instead of constructing a Frankenstein.

     

Links

Recent Posts

Categories