5 ways to make your codebase withstand the test of time

  • web-dev
  • architecture
  • design-patterns
  • legacy

This is the first in a series of articles where @hecrj and I share what we have learned after working on a large, fast-changing codebase for the past 3 years, and being perfectly happy with the result!


If you are a web developer, you are probably used to having new frameworks, libraries and technologies come out every other week.

We are on a never-ending quest to find better tools and patterns, but does that mean our code is doomed to become old and wrinkly?

Your decisions will always impact many other people

How do you anchor your project so that it resists the Winds of Trend? Here are 5 tips that have worked out pretty well for us.

1. Split your code based on domain concepts, not tech concepts

One of the first questions you may have when starting a new project is how should you structure it. There are two popular schools of thought here: Either we split our files by tech concepts, or by domain concepts.

# Split by tech concepts        # Split by domain concepts

|- src                          |- auth
|  |- controllers               |  |- controllers
|  |  |- auth                   |  |- models
|  |  |- profile                |  |- views
|  |  |- article                |  |- tests
|  |- models                    |- profile
|  |- views                     |- article
|- test                         (...)
|  |- controllers
|  |  |- auth
(...)

If you’ve read the header you might have an idea of what we’ll recommend, but let’s back that up with a few thoughts.

Say you arrive at the root of a project with a specific goal (hunting down a bug, adding a feature, removing it, etc.). You need to find the appropriate code, navigate through related files, take a look at the tests, and when you feel confident enough, make those changes to the codebase.

As developers, this process is our bread and butter, so we better make it efficient.

What is easier to maintain, a codebase with 10 files or one with 100 files?

Splitting code by domain concepts allows you to focus on a small part of your codebase, whereas doing it by tech concept forces you to jump around.

2. Provide a public contract (API) for all your domain concepts

Imagine your project has a payments directory where you keep all 💰-related code. We have a series of components to store our payments in a database or connect to 3rd-party services like Stripe.

All those components are there to fulfill a contract, that is, to make sure payments behave they way they should.

Just to be clear, we are not talking about the HTTP API your mobile app will call to charge users. We are talking about an internal API that turns your payments directory into its own “microservice” (using the term freely).

Why, you ask?

Because having an explicit API provides:

  • A clear picture of the expected behavior.
  • A minimum test coverage everyone can agree upon and commit to.
  • The freedom to change anything from the underlying implementation.

Furthermore, it is important for this API to know as little as possible of external concepts such as users, permissions or environments. These are not part of the domain. They are the way we solve a problem with the communication layer (a public HTTP endpoint is inherently insecure) or our development workflow.

For instance, we can imagine having:

  • A public-facing API that exposes some of the domain behavior and controls authentication and authorization.
  • A private admin API + panel to provide easy customer support and look into bugs without ever touching any database or console.
  • A really easy way to write fixtures, examples and migrations.

3. Rely on small interfaces

This one is pretty popular. As developers, we are constantly reminded to rely on abstractions instead of concrete implementations, segregate our interfaces and invert our dependencies.

You can easily find plenty of material covering the theory, so let’s focus on some practical examples. Our Payments app might need to talk to these interfaces:

  • An event publisher
  • An event subscriber
  • A credit card charger
  • An email sender

All these interfaces have a small and clearly defined role. Later on, we will inject the particular implementations:

production = Payments.new(
  event_publisher: rabbitmq,
  event_subscriber: rabbitmq_replicas,
  credit_card_charger: stripe,
  email_sender: mailgun,
)

development = Payments.new(
  event_publisher: in_memory_bus,
  event_subscriber: in_memory_bus,
  credit_card_charger: stripe_test_mode,
  email_sender: muted_mailer,
)

As you can see, small interfaces allow us to create well-defined tests and choose the best strategy for each action depending on the environment. On the other hand, we usually write implementations based on particular technologies, to centralize all the knowledge and helper functions around them.

4. Decouple your data from your storage strategy

Let’s get it out of the way: We think ORMs are wrong (or maybe it’s people who are doing them wrong). Take a look at this Ruby on Rails code:

class Article < ActiveRecord::Base
  belongs_to :user
  has_many :comments, dependent: :destroy

  scope :authored_by, ->(username) { where(user: User.where(username: username)) }

  validates :title, presence: true, allow_blank: false
  validates :body, presence: true, allow_blank: false

  before_validation do
    self.slug ||= "#{title.to_s.parameterize}-#{rand(36**6).to_s(36)}"
  end
end

There’s a lot to unpack here.

First, we notice this object is describing relationships, cascade deletion and nullable attributes. Exactly what you would expect from an Object-Relational Mapper. Quite transparent!

Next, let’s consider for a moment. What is important for us when representing an Article?:

  • We should be able to harness the full power of the language we are using. When we are using Java, we want to be able to use OO patterns and inheritance freely. When we are using Haskell, we want to use union types and records.
  • We should be able to store our data in different formats and databases. This allows us to use ElasticSearch for performant searches, PostgreSQL for a consistent state and Redis to keep our autosave feature fast enough. ORM models offer neither, because they are just a way to interface with a SQL database. We still need to represent and manipulate our data somewhere else. The problem is, once you accept this statement, using an ORM seems awkward or overkill. This is what we mean:
# Let's say we have a series of entities in our domain that we use to represent an article.
class Article; end # The big picture
class Tag; end
class RichText; end # Headings, bold, cross-references, …


# Now we need an interface to store the article's content in our SQL database.
class ArticleStore
  def store(title:, body:, tags:, author:)
    # Ruby doesn't have explicit interfaces, but you get the point
    raise NotImplementedError
  end
end


# Using an ORM creates an additional level of indirection that looks pointless
class ArticleORMStore < ArticleStore
  def store(title:, body:, tags:, author:)
    ArticleModel.create(title: title, body: body, tags: tags, author: UserModel.get(author.id))
  end
end


# A low-level SQL library feels simpler in comparison.
class ArticleSimpleStore < ArticleStore
  def store(title:, body:, tags:, author:)
    article_table.insert(title: title, body: body, tags: tags, author: author.id)
  end
end

The bottom line here is: You can use ORMs, but don’t use them as the only way to represent and manipulate your data. That’s far from their purpose.

5. Use events to keep your application connected and your code decoupled

If two parts of an application are connected, the code must connected somehow, right?

Event-driven programming does a wonderful job at keeping your app interconnected, but your code easy to write and maintain. In fact it does such a good job that similar ideas have become pervasive in mobile and frontend development under the name of Reactive Programming, and in the operations world, with cloud providers and companies betting hard on it.

The basic idea is that every change to your domain is represented as an atomic event.

article_published(...) 1 minute ago
article_draft_created(...) 5 minutes ago
user_signed_in(...) 25 minutes ago

All events are published through some kind of event bus, and random observers can subscribe and react to interesting events without bothering the other components too much.

It takes a bit of an extra effort at first, as you need to lay the foundation for the event bus and think about the properties and atomicity of each event, but in the long run it’s definitely worth it.

Here are some examples of features that are very easy to implement with event-driven architectures, and hard to think about and maintain otherwise:

  • Listen for comments on an article and increase a counter (purpose: faster comment counts).
  • Send a welcome email to a new user.
  • Notify the author of an article that it has new comments.

Try to imagine how you would do each of these tasks in an imperative way vs. a reactive way.

Event-driven programming avoids long functions with many different side effects, and makes your tests nicer and more isolated.


In the next article we’ll explain how we put all these pieces together to create our own architecture.