Modern software engineering

Humanity’s best approach to learning is science, so we need to adopt the techniques and strategies of science and apply them to our problems.

The scientific method:

Characterize: make an observation of the current state.
Hypothesize: create a description, a theory that may explain your observation.
Predict: make a prediction based on your hypothesis.
Experiment: test your prediction.

To become experts at learning, we need the following:

Iteration
Feedback
Incrementalism
Experimentation
Empiricism

To become experts at managing complexity, we need:

Modularity
Cohesion
Separation of concerns
Abstraction
Loose coupling

Engineering simply means the “stuff that works.” It is the process and practice that you apply to increase your chances of doing a good job.

For most human endeavor, the production of “things” is the hard part. It may take effort and ingenuity to design a car, an airplane, or a mobile phone, but taking that initial prototype design and idea into mass production is immensely more expensive and complicated.

Waterfall processes are production lines for software. They are the tools of mass production. They are not the tools of discovery, learning, and experimentation.

There was a fascination on my part with errors, a never ending pass-time of mine was what made a particular error, or class of errors, happen and how to prevent it in the future.

Most work in language design seems to concentrate on the wrong things, such as syntactic advances rather than structural advances.

The art of programming is the art of organizing complexity.

In continuous delivery, we work so that every small change, multiple times per day, is releasable. It should be finished to the degree that we can safely and reliably release our software into production at any point.

Each change is finished because it is releasable, so the only sensible measure of “finished” is that it delivers some value to its users. That is a very subjective thing. How do we predict how many changes are needed to represent “value” to our users? What most organizations do is to guess at a collection of features that, in combination, represent “value,” but if I can release at any point in the life of my software, this is a somewhat blurry concept.

Human beings are remarkable, but being as smart as we are takes an enormous amount of processing. Our perception of reality is not “reality,” and we have a series of biological tricks to make our perception of reality appear to be seamless. For example, our visual sampling rate is surprisingly slow. The smoothness of your perception of reality is an illusion created by your brain.

Most of what you see is a guess made up by your brain.

We create a model of the problem before us, and we check to see if everything that we currently know fits the model.

As soon as we introduce the code to synchronize things, we start to see some costs that we hadn’t anticipated. Locks are extremely expensive. If we decide to divide up the work between only 2 threads and synchronize their results, it is 746 times slower than doing the work on a single thread.

Science is much broader than only physics, but outside the realms of the simplified abstractions that we use at the heart of physics, other sciences are often messier and less precise.

We can create millions of experiments, at low cost, allowing us to use the power of statistics to our advantage. Simplistically, this is what modern ML really is.

Whatever the time pressure, writing bad code is never a time-saver.

Technically, one way to accomplish this independence is to take the modularity of the system so seriously that each module is, in terms of build, test, and deployment, independent form every other module. This is what microservices are. They are so modular that we don’t need to test them with other services prior to release. If you are testing your microservices together, they aren’t really microservices.

Nearly everyone would like some ideal middle ground between these 2 extremes, but in reality, it doesn’t exist. The middle ground is a fudge and is often slower and more complex than the monolithic approach that everyone strives, so assiduously, to avoid. The more organizationally distributed approach that is microservices is the best way that we know to scale up software development, but it is not a simple approach and comes at a cost.

Some people criticize this style of design. This criticism generally takes the form that it is harder to understand code that has a bigger surface area in this way. It is harder to follow the flow of control through the system. This criticism misses the point. The problem here is that if it is necessary to expose that surface area in order to test that code, then that is the surface area of the code. How much harder it is to understand if it is obscured by poor interface design and a lack of test?

Testing, when done well, exposes something important and true about the nature of our code, the nature of our designs, and the nature of the problem that we are solving that is not otherwise easily accessible. As a result, it is one of the most important tools in our arsenal to create better, more modular systems and code.

The problem is about coupling. As long as the pieces are truly independent of one another, truly decoupled, then we can parallelize all we want. As soon as there is coupling, there are constraints on the degree to which we can parallelize. The cost of integration is the killer.

The best way to parallelize things is to do it in a way where there is no need to re-integrate. Essentially, this is the microservice approach.

The primary goal of code is to communicate ideas to humans. We write code to express ideas as clearly and simply as we can. We should never choose brevity at the cost of obscurity. Making our code readable is both a professional duty of care and one of the most important guiding principles in maintaining complexity. Optimize to reduce thinking rather than to reduce typing.

DDD also introduced the concept of the “bounded context.” This is a part of a system that shares common concepts. For example, an order-management system probably has a different concept of “order” from a billing system, so there are 2, distinct bounded contexts.

This is an useful concept for helping to identify sensible modules or subsystems when designing our systems.

It is ridiculous to imagine a system that has no coupling. If we want 2 pieces of our system to communicate, they must be coupled to some degree. So like cohesion, coupling is a matter of degree rather than any kind of absolute measure.

Coupling is in some ways the cost of cohesion. In the areas of your system that are cohesive, they are likely to also be more tightly coupled.

Dependency injection is where dependencies of a piece of code are supplied to it as a parameters, rather than created by it.

Just because it is “accidental” does not mean that it is unimportant; our software is running on a computer, so dealing with the constraints and the realities of that is important.

One of the keys here is to try to maintain a very low tolerance for complexity. Code should be simple and readable, and as soon as it begins to feel like hard work, you should pause and start looking for ways to simplify and clarify the part in front of you.

It is almost never the case that your code cares about every single detail of an API that it consumes. You nearly always deal with a subset of such APIs. The port that you create only needs to expose the minimal subset that you choose to use, so it will be a simpler version of the API that you are interacting with.

In the context of separation of concerns, our tests become more difficult to write the more that concerns are conflated within the scope of a test. If we organize our development around testing and drive our development through testing, then we are confronted much earlier in the process by the costs and benefits of our design decisions.

There is a cost to doing work. In cooking, part of that cost is the time it takes to clean up and maintain your tools as you go. In software, those costs are to refactor, to test, to take the time to create good designs, to fix bugs when they are found, to collaborate, to communicate, and to learn.

If you can’t, or won’t, change the code, then the code is effectively dead. As soon as one freezes a design, it becomes obsolete.

An authorization service that reports functional failures as HTML errors and a business logic module that returns NullPointerExceptions are both breaking business-level abstractions with technical failures.

All models are wrong, some models are useful.

One idea that is more promising in raising the level of abstraction is the idea of DSL. DSL is not general purpose. It is, intentionally, more narrowly focused and be more abstract, hiding detail.

As soon as we allow 3rd-party code into our code, we are coupled to it. In general, my preference is to always insulate your code from 3rd-party code with your own abstractions.

Our ability to identify a sweet spot for our abstractions is enhanced by designing for testability. Attempting to write a test and simulating the use of the interface that we are creating gives us an opportunity to experience and exercise our understanding of that interface to the code under test.

We often talk about the value of more loosely coupled systems, but let’s be clear: if the components of your software system are perfectly decoupled, then they can’t communicate with one another. This may, or may not, be helpful.

Coupling is not something that we can, or should, aim to always wholly eliminate.

In general, by far the most common way for developers and teams to make a big mistake is in the direction of overly tight coupling. There are costs to “too loose coupling,” but they are generally much lower costs than the costs of “too tight coupling.”

There are ways we can minimize this overhead and make this coordination as efficient as possible. The best way to do this is through continuous integration. We will keep all our code in a shared space, a repository, and each time any of us changes anything, we will check that everything is still working.

Google and FB do this for nearly all of their code. The downside of scaling up in this way is that you have to invest heavily in the engineering around repositories, builds, CI, and automated testing to get feedback on changes quickly enough to steer development activities.

The problem is that the cost of having one canonical representation of any given idea across a whole system increases coupling, and the cost of coupling can exceed the cost of duplication.

This is a balancing act.

Dependency management is an insidious form of developmental coupling. If your service and my service share the use of a library of some kind and you are forced to update your service when I update mine, then our services and our teams are developmentally coupled.

The advantage of DRY is that when something changes, we need to change it in only 1 place; the disadvantage is that every place that uses that code is coupled in some way.

As soon as we establish such a boundary, whatever its nature, any idea of synchrony is an illusion, and that illusion comes at a cost.

The leakiness of this abstraction is most dramatic when thinking about distributed computing.

We can achieve this in 3 ways. We can work with more coupled code and systems but through continuous integration and continuous delivery get fast enough feedback to identify problems quickly. We can design more decoupled systems that we can safely, with confidence, change without forcing change on others. Or we can work with interfaces that have been agreed on and fixed so that we never change them.

Humans are error-prone. We are particularly bad at checking things over, because we often tend to see what we expect to see, rather than what is really there. This is not a criticism of our laziness as much as a recognition of the limitations of our biology. We are built to jump to conclusions, a very good trait for wild humans in hostile environments.

If we wait until we think we are finished, we are clearly not getting high-quality, timely feedback. We will probably forget all the little nuances of what we did, so our testing will be somewhat cursory. It is also going to be quite the chore.

The difference is that TDD encourages the design of testable code and unit testing does not. Unit testing, after the code is written, encourages us to cut corners, break encapsulation, and tightly couple our code to the code that we already wrote.