Category: Software Components

  • Software Components and Layers: Problems with Data

    Components and layers are supposed to make software programs easier to create and maintain. They are supposed to make programmers more productive and reduce errors. It’s all lies and propaganda! In fact, the vast majority of software architectures based on components (ranging from tiny objects to large components) and/or layers (UI, server, storage and more) lead to massive duplication and added value-destroying work. The fact that these insane methods continue to be taught in nearly all academic environments and required by nearly all mainstream software development groups is cornerstone evidence that Computer Science not only isn’t a science, it resembles a deranged religious cult.

    Components and Layers

    It has been natural for a long time to describe the different “chunks” of software that work together to deliver results for users in dimensional terms.

    First there’s the vertical dimension, the layers. Software that is “closer” to end users (real people) is thought of as being on “top,” while the closer software is to long-term data storage and farther from end users is thought of as being on the “bottom” of a software system. In software that has a user interface, the different depths are thought of as “layers,” with the user interface the top layer, the server code the middle layer and the storage code the bottom layer. Sometimes a system like this is called a “three tier” system, evolved from a two tier “client/server” system.

    Second there’s the horizontal dimension, the components. These are bodies of code that are organized by some principle to break the code up into pieces for various reasons. I've given a detailed description of components. A component may have multiple layers in it or it may operate as a single layer.

    Layers are often written in different languages and using different tools. Javascript is prominent in user interface layers, while some stored procedure language is often used in the lowest data access layer. The server-side layer may be written in any of dozens of languages, including java and python. Sometimes there are multiple layers in server-resident software, with a scripting language like PHP or javascript used for server code on the “top,” communicating with the client software.

    Components are self-contained bodies of software that communicate with other components by some kind of formal means. Formal components always have data that can only be accessed by the code in the component. The smallest kind of component is an object, as in object-oriented languages. The data members of the object can only be accessed by routines attached to the object, known as methods.  Methods can normally be called by methods of other objects. Larger components are often called microservices or services. These are usually designed so that they could run on separate machines, with calling methods that can span machines, like old RPC’s or more modern RESTful API’s. Sometimes components are designed so that instead of being directly called, they take units of work from a queue or service bus, and send results out in the same way. When a component makes a call, it calls a specific component. When a component puts work onto a queue or bus, it has no knowledge or control over which component takes the work from the queue or bus.

    Layer and components are often combined. For example, microservice components often have multiple layers, frequently including a storage layer. With a thorough object-oriented approach, there could be multiple objects inside each layer that makes up a service component.

    Why are there layers and components? Software certainly doesn’t have to use these mechanisms to be effective. In fact, major systems and tools have been built and widely deployed that mostly ignore the concepts of layers and components! See this for historic examples.

    The core arguments for layers and components typically revolve around limiting the amount of code (components) and the variety of technologies (layers) a programmer has to know about to make the job easier, along with minimizing errors and maximizing teamwork and productivity. I have analyzed these claims here and here.

    Data Definitions in Components and Layers

    All software consists of both procedures (instructions, actions) and data. Each of those needs to be defined in a program. When data is defined for a procedure to act upon, the data definitions are nearly always specific to and contained within the component and/or layer they’re part of. For the general concept and variations on how instructions and data relate to each other, see this.

    Thinking about data that’s used in a component or layer, some of the data will be truly only for use by that component or layer. But nearly any application you can imagine has data that is used by many components and layers. This necessity is painful to object/component purists who go to great lengths to avoid it. But when a piece of data like “application date” is needed, it will nearly always have to be used in multiple layers: user interface, server and database. To be used it must be defined. So it will be defined in each layer, typically a minimum of three times!

    When data is defined in software, it always has a name used by procedures to read or change it. It nearly always also has a data type, like character or integer. The way most languages define data that’s it! But there’s more.

    • When the “application date” data is shown to the user in the UI layer, it typically also needs a label, some formatting information (month, day, year), error checking (was 32 entered into the day part?) and an error message to be displayed.
    • When application date is used in the server layer by some component, some kind of error checking is often needed to protect against disaster if a mistaken value is sent by another component.
    • When a field like social security number is used in the storage layer, sanity checks are often applied to make sure, for example, that the SSN entered matches the one previously stored for the particular customer already stored in the database.
    • There need to be error codes produced if data is presented that is wrong. When the user makes an error, you can’t use a code, you have to use a readable message, which the user layer might need to look up based on the error code it gets from another component or layer.
    • Each language has its own standards for defining data and the attributes associated with it. Someone has to make sure that all the definitions in varying languages match up, and that when changes are made they are made correctly everywhere.
    • If the data is sent from one component to another, more definitions have to be made: the procedure that gets the data, puts it into a message and sends it, possibly on a queue; the procedure that gets the message, maybe from a queue and sends the data to the routine that will actually do the processing.
    • When data is sent between components, various forms of error checking and error return processing must also be implemented for the data to protect against the problems caused by bad data being passed between components that, for example, might have been implemented and maintained by separate groups. Sometimes this is formalized into "contracts" between data-interchanging components/layers.

    So what did we gain by breaking everything up into components and layers? A multiplication of redundant data definitions containing different information, expressed in different ways! What we “gained” by all those components and layers was a profusion of data definitions! The profusion is multiplied by the need for components and layers to pass data among themselves for processing.The profusion can’t be avoided and can only be reduced by introducing further complexity and overhead into the component and layer definitions.

    See this for another take on this subject.

    I’ve heard the argument that unifying data definitions makes things harder for the specialists that often dominate software organizations. The database specialists are guardians of the database, and make sure everything about it is handled in the right way. The user interface specialists keep the database specialists away from their protected domain, because if they meddled the users wouldn’t enjoy the high quality interfaces they’ve come to expect. There is no doubt you want people to know their stuff. But none of this is really that hard – channeling programmers into narrow specialties is one of the many things that leads to dysfunction. Programmers produce the best results by thoroughly understanding their partners and consumers, which can only be done by spending time working in different roles – for example spending time in sales, customer service and different sub-departments of the programming group.

    Data Access in Components and Layers

    Now we’ve got data defined in many components and layers. In a truly simple system, data would be defined exactly once, be read into memory and be accessed in the single location in which it resides by whatever code needs it for any reason. If the code needs to interact with the user, perform calculations or store it, the code would simply reference the piece of data in its globally accessible named memory location and have at it. Fast and simple.

    If this concept sounds familiar, you may have heard of it in the world of relational DBMS’s. It’s the bedrock concept of having a “normalized” schema definition, in which each unique piece of data is stored in exactly one place. A database that isn’t normalized is asking for trouble, just like the way that customer name and address are frequently stored in different places in various pieces of enterprise software that evolved over time or were jammed together by corporate mergers.

    Components and layers performing operations on global data is neither fast nor simple. Suppose for example that you’ve got a new financial transaction that you want to process against a customer’s account. In an object system, the customer account and financial transaction would be different objects. That’s OK, except that the customer account master probably has a field that is the sum of all the transactions that have taken place in the recent time period.

    In a sensible system, adding the transaction amount to a customer transaction total in the customer record would probably be a single statement that referenced each piece of data (the transaction amount and the transactions total) directly by name. Simple, fast and understandable.

    In a component/object system that’s properly separated, transaction processing might be handled in one component and account master maintenance in another. In that case, highly expensive and non-obvious remote component references would have to be made, or a copy of the transaction placed on an external queue. In an object system, a method of the transaction object would have to be called and then a method of the account master object.

    It’s a good thing we’ve got smart, educated Computer Scientists arranging things so that programmers do things the right way and don’t make mistakes, isn’t it?

    Conclusion

    With the minor exception of temporary local variables, nearly every layer or component you break a body of software into leads to a multiplication of redundant, partly overlapping data definitions expressed in different languages — each of which has to be 100% in agreement to avoid error. The communication by call or message between layers and components to send and receive data increases the multiplication of data definitions and references. Adding the discipline of error checking is a further multiplication.

    Not only does each layer and component multiply the work and the chances of error, each simple-seeming change to a data definition results in a nightmare of redundancy. You have to find each and every place the data is defined, used or error-checked and make the right change in the right language. Components and layers are the enemy of Occamality. Software spends the VAST majority of its life being changed! Increasingly scattered data definitions make the thing that is done 99% of the time to software vastly harder and riskier to do.

  • The Dangerous Drive Towards the Goal of Software Components

    Starting fairly early in the years of building software programs, some programs grew large. People found the size to be unwieldy. They looked for ways to organize the software and break it up into pieces to make it more manageable. This effort applied to the lines of code in the program and to the definitions of the data referred to by the code, and to how they were related. How do you handle blocks of data that many routines work with? How about ones whose use is very limited? The background of this issue is explained here.

    These questions led to a variety of efforts that continue to this day to break a big body of software code and data definitions into pieces. The obvious approach, which continues to be used today, was simply to use files in directories. But for many people, this simple, practical approach wasn't enough. They wanted to make the walls between subsets of the code and data higher, broader and more rigid, creating what were generally called components. The various efforts to create components vary greatly. They usually create a terminology of their own, terms like “services” and “objects.” Such specialized, formulaic approaches to components are claimed to make software easier to organize, write, modify and debug. In this post I’ll take a stab at explaining the general concept of software components.

    Components and kitchens

    When you’re young and in your first apartment, you’ve got a kitchen that starts out empty. You’re not flush with cash, so you buy the minimum amount of cooking tools (pots, knives, etc.) and food that you need to get by. You may learn to make dishes that need additional tools, and you may move into a house with a larger kitchen that starts getting filled with the tools and ingredients you need to make your growing repertoire of dishes. You may have been haphazard about your things at the start, but as your collection grows you probably start to organize things. You may put the flour, sugar and rice on the same shelf, and probably put all the pots together. You may put ingredients that are frequently used together in the same place. You probably store all the spices and herbs together. It makes sense – if everything were scattered, you’d have a tough time remembering where each item was. The same logic applies to a home work bench and to clothes.

    The same logic applies for the same reason to software! It’s a natural tendency to want to organize things in a way that makes sense – for remembering where they are, and for convenience of access. This natural urge was recognized in the early days of software programming. Given that there aren’t shelves or drawers in that invisible world, most people settled on the term “component” as the word for a related body of software. The idea was always that instead of one giant disorganized block of code, the thing would be divided into a set of components, each of which had software and data that was related.

    A software program consists of a set of procedures (the actions) and a set of data definitions (what the procedures act on). Breaking up a large amount of code into related blocks was helped by the early emergence of the subroutine – a block of code that is called on to do something and then returns with a result; kind of like a mixer in a kitchen. The problems that emerged were how to break a large number of subroutines into components (each of which had multiple subroutines), and how to relate the various data definitions to the routine components. This problem resembles the one in the kitchen of organizing tools and ingredients. Do you keep all the tools separate from the ingredients, or do you store ingredients with the tools that are used to process them?

    In the world of cooking, this has a simple answer. If all you do is make pancakes, you might store the flour and milk together with the mixing bowls and frying pans you use with them. But no one ever just makes pancakes – duh! And even if you did, you’d better put the milk and butter in the fridge! Ditto with spices. Nearly everyone has a spice drawer or shelf, and spices used for everything from baking to making curries are stored there.  Similarly, you store all the pans together, all the knives, etc.

    In software it’s a little tougher, but not a lot. One tough subject is always do you group routines (and data) together by subject/business area or by technology area? The same choice applies to organizing programmers in a department. It makes sense for everyone who deals primarily with user interfaces to work together so they can create uniform results with minimal code. Same thing with back-end or database processing. But over time, the programmers become less responsive to the business and end users; the problem is often solved by having everyone who contributes to a business or kind of user work as a team. Responsiveness skyrockets! Before long, things begin to deteriorate on the tech side, with redundant, inconsistent data put into the database, UI elements that varying between subject areas, confusing users, etc. There’s pressure to go back to tech-centric organization. And so it goes, round and round. Answer: there is no perfect way to organize a group of programmers!. Just as painfully, there is no perfect way to organize a group of procedures and their relationship to the data they work on!!

    The drive towards components

    There are two major dimensions of making software into components. One dimension is putting routines into groups that are separate from each other. The second dimension is controlling which routines can access which data definitions.

    Separating routines into groups can be done in a light-weight way based on convenience, similar to having different workspaces in a kitchen. In most applications there are routines that are pretty isolated from the rest and others that are more widely accessed. Enabling programmers to access any routine at any time and having access to all the source code makes things efficient.

    Component-makers decided that programmers couldn't be trusted with such broad powers. They invented restrictions to keep routines strictly separate. While there were earlier versions of this idea, a couple decades ago "services" were created to hold strictly separate groups of routines. An evolved version of that concept is "micro-services." See this for an analysis.

    The second major dimension of components is controlling and limiting the relationship between routines and data. The first step was taken very early. It was sensible and remains in near-universal use today: local variables. These are data definitions declared inside a routine for the private use of that routine during its operation. They are like items on a temporary worksheet, discarded when the results are created.

    Later steps concerning "global" variables were less innocent. The idea was to strictly separate which routines can access which data definitions. Early implementations of this were light-weight and easily changed. Later versions built the separation into the architecture and code details. For example, each micro-service is ideally supposed to have its own private database and schema, inaccessible to the other services. This impacts how the code is designed and written, increasing the total amount of code, overhead and elapsed time.

    Languages that are "object oriented" are an extreme version of the component idea. In O-O languages, each data definition ("class") can be accessed only by an associated set of routines ("methods"). This bizarre reversal of the natural relationship between code and data results in a wide variety of problems.

    These ideas of components can be combined, making the overhead and restrictions even worse. People who build micro-services, for example, are likely to use O-O languages and methods to do the work. Obstacles on top of problems.

    Components in the kitchen

    All this software terminology can sound abstract, but the meaning and consequences can be understood using the kitchen comparison.

    What components amount to is having the kitchen be broken up into separate little kitchenettes, each surrounded by windowless walls. There are two openings in each kitchenette's walls, one for inputs and one for outputs. No chef can see or hear any other chef. The only interaction permitted is by sending and receiving packages. Each package has an address of some kind, along with things the sending chef wants the receiving chef to work on. For example, the sending chef may mix together some dry ingredients and send them to another chef who adds liquid and then mixes or kneads the dough as requested. The mixing chef might then put the dough into another package and return it to the sending chef, who might put together another package and send it to the baking chef who controls the oven.

    If the components are built as services, the little kitchenettes are built as free-standing buildings. In one version of components (direct RESTful calls), there is a messenger who stands waiting at each chef's output window. When the messenger receives a package, the messenger reads the address, jumps into his delivery van with the package and drives off to the local depot and drops off the package; another driver grabs the package, loads it into his van and delivers it to the receiver's input window. If the kitchenettes are all in the same location the vans and depot are still used — the idea is that a genius administrator can change the location of the kitchenettes at any time and things will remain the same.

    Another version of components is based around an Enterprise Service Bus (ESB). This is similar to the vans and the central office depot except that the packages are all queued up at the central location, which has lots of little storage areas. Instead of a package going right to a recipient, it's sent to one of these little central storage areas. Then, when the chef in a kitchenette is ready for more work he sends a requests to the central office, asking for the next package from a given little storage area. Then a worker grabs the oldest package, gives it to a driver who puts it in his van and delivers the package to the input window of the requesting chef.

    If this sounds bizarre and lots of extra work, it's because … it's bizarre and requires lots of extra work.

    The ideal way to organize software

    The ideal way to break software into components is actually pretty similar to the way good kitchens are organized. Generally speaking, you start by having the same kinds of things together. You probably store all the pots and pans together, sorted in a way that makes sense depending on how you use them. You probably pick a place that’s near the stove where they’ll probably be used – maybe even hanging on hooks from the ceiling. You probably have a main store of widely used ingredients like salt, but you may periodically put containers of it near where it is most often used. An important principle is that most chefs work at or near their stations – but (it goes without saying) can move anywhere to get anything they need. There aren't walls stopping you, and you don’t bother someone else to get something for you when you can more easily do it yourself.

    Exactly the same principle applies whether you are creating an appetizer, an entree, a side dish or a dessert — you do the same gathering and assembly from the ingredients in the kitchen, but deliver the results on different plates at different times.

    In software this means that while routines may be stored in files in different directories for convenience, and that usually your work is confined to a single directory, you go wherever you need to in order to do your job. Same thing with data definitions; you can break them up if it seems to make things more organized, but any routine can access any data definition it needs to. When you’re done, you make a build of the system and try it out. That's what champion/challenger QA is for.

    Are you deploying on a system that has separate UI, server and storage? No problem! That's what build scripts (which you would have anyway) are for! This approach makes it easier to migrate from the decades-old DBMS-centric systems design and move towards a document-centered database with auto-replication to a DBMS for reporting, which typically improves both performance and simplicity by many whole-number factors.

    Are the chefs in your kitchen having trouble handling all the work in a timely way? In the software world you might think of making a "scalable architecture" with separate little kitchenettes, delivery vans, etc. — which any sensible person knows adds trouble and work and makes every request take longer from receipt to ultimate delivery. In the world of kitchens (and sensible software people) you might add another mixer or two, install a couple more ovens and hire another chef or two, everyone communicating and working in parallel, and churning out more excellent food quickly, with low overhead.

    If you think this sounds simple, you’re right. If you think it sounds simplistic, perhaps you should think about this: any of the artificial, elaborate, rigid prescriptions for organizing software necessarily involves lots of overhead for people and computers in designing, writing, deploying, testing and running software. Each restriction is like telling a chef that under no circumstances is he allowed to access this shelf of ingredients or use that tool when the need arises.

    Arguments to the contrary are never backed with facts or real-world experiments – just theory and religious fervor. Instead, you should consider the effective methods of optimizing a body of software, which are centered on Occamality (elimination of redundancy) and increasing abstraction (increasingly using metadata instead of imperative code). Both of these things will directly address the evils that elaborate component methods are supposed to cure but don't.

  • How Micro-Services Boost Programmer Productivity

    There's a simple way to understand the impact of micro-services on programmer productivity: they make it worse. Much worse. How can that be?? Aren't monolithic architectures awful nightmares making applications unable to scale and causing a drain on programmer productivity? No. No. NO! Does this mean that every body of monolithic code is wonderful, supporting endless scaling and optimal programmer productivity? Of course not. Most of them have problems on many dimensions. But the always-wrenching transition to micro-services makes things worse in nearly all cases. Including reducing programmer productivity.

    Micro-services for Productivity

    Micro-services are often one of the first buzzwords appearing on the resumes of fashion-forward CTO's. They are one of today's leading software fashions; see this for more on software fashions. I have treated in depth the false claims of micro-services to make applications "scalable."

    I recently discussed the issue with a couple CTO's using micro-services who admitted that their applications will never need to get up to hundreds of transactions a second, a tiny fraction of the capacity of modern cloud DBMS's. In each case, they have fallen back on the assertion that micro-services are great for programmer productivity, enabling their teams to move in fast, independent groups with minimal cross-team disruption.

    The logic behind this assertion has a couple major aspects. The basic assertion that small teams, each concentrating on a single subject area, are more productive than large, amorphous teams is obviously correct. This is an old idea.  It has nothing to do with micro-services. It takes a fair amount of effort for members of a team to develop low-friction ways of working together, and it also takes time to understand a set of requirements and a body of existing code. Why not leverage that investment, keeping the team intact and working on the same or similar subject areas? Of course you should! No-brainer!

    Here's the fulcrum point: given that it's good to have a small team "owning" a given set of functionality and code, what's the best way to accomplish this? The assertion of those supporting micro-services is that the best way is to break the code into separate, completely distinct pieces, and to make each piece a separate, independent executable, deployed in its own container, and interacting with other services using some kind of service bus, queuing mechanism or web API interface shared by the other services. The theory is that you can deploy as many copies of the executables as you want and change each one independently of the others, resulting in great scalability and team independence. In most cases, each micro-service even has its own data store for maximum independence.

    I covered the bogus argument about scalability here. You can deploy all the copies of a monolith that you want to. Having separate DBMS's introduces an insane amount of extra work, since there's no way (with rare exceptions) each service database would be truly independent of the others, and sending around the data controlled by the other services at least triples the work. The original service has to get the data and not only store it locally, but send it to at least one other service, which then has to receive it and store it. That's 4X the work to build in the first place and 4X again every time you need to make changes. And sending things from one service to another is thousands of times slower than simply making a local call.

    Now we're down to productivity. Surely having the group concentrate on its own body of code is a plus, isn't it? YES! It is! But how does having a separation of concerns in a large body of code somehow require that the code devoted to a particular subject be built and deployed as its own executable???

    Let's look at a "monolithic" body of code. Getting into detail, it amounts to a hierarchy of directories containing files. Some of the files will have shared routines (classes or whatever) and others won't be shared. For most groups, going to micro-services means taking those directories and converting them to separate directories, each built and deployed by the team that owns it. Anyone who's tried this and/or knows a large body of code well knows that there's a range of independence, with some routines being clearly separable, some being clearly shared and others some of each.

    A sensible person would look at this set of directories as a large group of routines organized into files and directories as it was built. They would see that as changes were made and things evolved, the code and the organization got messy. There were bits of logic that were scattered all over the place that should be put in one place. There were things that were variations on a theme that should be coded once, with the variations handled in parameters or metadata. There were good routines that ended up in the wrong place. The sensible person realizes that not only is this messy, but it makes things harder to find and change, and makes it more likely that something will break when you change it.

    The sensible person who sees redundancy and sloppy organization wants to fix it. One long-used way to organize code to avoid the problem is to create what are called "components," which are sort of junior versions of microservices. Here is a detailed description of the issues with components, and how sensible people respond to the hell-bent drive towards components. The metaphor of a kitchen with multiple cooks as an apt one.

    Then there's the generic approach of technical debt. While you can go crazy about this, the phrase "paying down technical debt" is a reasonable one. For my take on this tricky subject, see this. Here's a simple way to understand the process and value, and here's a more far-reaching explanation of the general principle of Occamality.

    The sensible person now has things organized well, with most of the redundancy squeezed out. Why is this important? Simple. What do you mostly do to code? Change it. When you look for the thing that needs changing, where would you like it to be? In ONE PLACE. Not many places in different variations. Concerns about basically the same thing should be in a single set of code. You can build it, deploy it and test it easily.

    What's the additional value of taking related code and putting it in its own directory tree, with its own build and deployment? None! First, it's extra work to do it. Second, there are always relationships between the "separate" bodies of code — that's why, when they're separate services, you've got enterprise service buses, cross-service calls, etc. Extra work! And HUGE amounts of performance overhead. Even worse, if you start out with a micro-services approach instead of converting to one, your separate teams will certainly confront similar problems and code solutions to them independently, creating exactly the kind of similar-but-different code cancer that sensible people try to avoid and/or eliminate!

    Separate teams with separate executables also have extra trouble testing. No extra testing trouble you say? Because you do test-driven development and have nice automated tests for everything? I'm sorry to have to be the one to tell you, but if you really do all this obsolete stuff, your productivity is worse by at least a factor of two than if you used modern comparison-based testing. Not to mention that your quality as delivered is worse. See this and this.

    Bottom line: separation of concerns in the code is a good thing. Among other things, it enables small groups to mostly work on just part of the code without being an expert in everything. Each group will largely be working on separate stuff, except when there are overlaps. All of this has nothing to do with separate deployment of the code blocks as micro-services.  Adding micro-services to the code clean-up is a LOT of extra work that furthermore requires MORE code changes, an added burden to testing and ZERO productivity benefit.

    Conclusion

    The claim that small teams working closely together on a sensible subset of a larger code base is a productivity-enhancing way to organize things is true. Old news. The claim that code related to a subject should be in one place also makes sense, kind of the way that pots and pans are kept in the kitchen where they're used instead of in bedroom closets. Genius! Going to the extreme of making believe that a single program should be broken into separate little independent programs that communicate with each other and that the teams and programs share nothing but burdensome communications methods is a productivity killer. It's as though instead of being rooms in a house, places for a family to cook, eat, sleep and relax each had its own building, requiring travel outside to get from one to the other. Anyone want to go back to the days of outhouses? That's what micro-services are, applied to all the rooms of a house.

Links

Recent Posts

Categories