September 21, 2017

Domain Primitives - the key to creating secure code

This article is an excerpt of topics discussed in the book Secure by Design that I'm currently writing together with Dan Bergh-Johnsson and Daniel Deogun.


Domain primitives and invariants

Some of the key properties of a value object in Domain-Driven Design are that it’s immutable, it forms a conceptual whole, and it can uphold invariants and check constraints. We have found that if you take the concept of the value object and slightly tweak it, while having security in mind, you get something called a domain primitive.

If you use domain primitives as the smallest building blocks in your domain model, you will be able to create code that by design will eliminate a wide variety of security issues commonly found in software. In this article you'll learn what domain primitives are, how to define them, and how they can be used to create secure software.

Domain primitives as the smallest building blocks

When modeling a value object you make sure it represents an important concept in your domain model. You decide how to represent the value object and what name it should have. If you take this further and also put effort into determining what it is and what it’s not, you will gain significantly deeper insight into that concept. You can then use that insight to introduce invariants that must be upheld in order for the value object to be considered valid.

You can continue and say that the value object not only should or can but must uphold these invariants, and that they must be enforced at the time of creation. What you end up with is a value object so strict in its definition that if it exists, it will also be valid. If it’s not valid then it cannot exist. This type of value object is what we refer to as a domain primitive.

A value object so precise in its definition that it, by its mere existence, manifests its validity is called a domain primitive.

Domain primitives are very similar to value objects in Domain-Driven Design. Key differences are that we are requiring invariants to exist and that they must be enforced at the point of creation. We are also prohibiting the use of simple language primitives, or generic types (including null), as representations of concepts in the domain model.

NOTE: Nothing in a domain model should be represented by a language primitive or a generic type. Each concept should be modeled as a domain primitive so that it carries meaning when passed around, and so it can uphold its invariants.

Let’s say you have the concept of quantity in your domain model. The quantity is the amount a customer wants to buy of a certain item in the webshop you are building. The quantity itself is a number, but instead of just representing it as an integer you create a domain primitive called Quantity. You discuss with the domain experts what is considered to be a valid quantity in the context of the current domain. This discussion reveals that a valid quantity is an integer value between 1 and 200. A zero quantity is not valid because if the customer wants to buy zero items, then the order should not exist at all. A negative value is not valid either because you can’t un-buy products, and returns are handled separately. Orders for more than 200 items are not handled by the system at all. Such large orders are extremely rare and if they do occur they need special handling, so they are dealt with via direct contact with a sales representative instead of through the online store.

You also encapsulate important behavior of the domain primitive. By having the domain primitive own and control domain operations, you reduce the risk of bugs caused by lack of detailed domain knowledge of the concepts involved in the operation. The further away from a concept they are the less detailed knowledge of the concept can be expected, so it makes sense to keep all domain operations within the domain primitive itself.

When you’re done, the Quantity domain primitive looks like the following listing — when represented in code.

Listing 1. The Quantity domain primitive
import static org.apache.commons.lang3.Validate.inclusiveBetween;
import static org.apache.commons.lang3.Validate.notNull;

public final class Quantity {

   private final int value;                                       ①

   public Quantity(final int value) {
      inclusiveBetween(1, 200, value);                            ②
      this.value = value;
   }

   public int value() {
      return value;
   }

   public Quantity add(final Quantity addend) {                   ③
      notNull(addend);
      return new Quantity(value + addend.value);
   }
   
   // equals() hashCode() etc...

}
1. The actual integer value
2. Enforcing invariants at time of creation
3. Providing domain operations to encapsulate behavior

This is a precise and strict code representation of the concept of quantity. A domain primitive like the Quantity created here would remove the possibility of some dishonest user sending in an invalid number and possibly tricking the system into unintended behavior. Using domain primitives removes a security vulnerability without the use of an explicit countermeasure.

As this example has shown, the quantity is not just an integer. It should be modeled as a domain primitive so that it carries meaning when passed around and so it can uphold its invariants.

When you start using domain primitives as the smallest building blocks in your domain model you will be able to create code that significantly reduces the possibility of security issues in your software simply by the way you are designing it. You are designing code that is precise and leaves little or no room for ambiguity. This type of code tends to contain fewer bugs and, as a consequence, fewer security vulnerabilities.

Context boundaries define meaning

Domain primitives, just like value objects, are defined by their value rather than an identity. This means that two domain primitives of the same type, and with the same value, are interchangeable with each other. Domain primitives are perfect for representing a wide variety of domain concepts that do not fit into the categories of entities or aggregates. One important aspect to keep in mind when modeling a concept using a domain primitive is that it should be defined to mean exactly what the concept is in the current domain.

Say you are building a system that will allow users to choose and create their own email addresses. A user can choose the local part of the email address (the part to the left of the @), and once created they can start sending and receiving messages using that address. If a user entered jane.doe then the email address jane.doe@example.com would be created (assuming your domain name is example.com). When modeling you realize that an email address is a perfect example of a domain primitive. It’s defined by its value, and you can come up with some constraints that you could use to assert that it’s a valid email address. At first, you might be inclined to use the official definition of an email address to figure out what constitutes a valid email address1. While this would technically be correct in terms of fulfilling an RFC, it might not be what’s considered a valid email address in the context of the current domain (figure 1). As an engineer this might come as a surprise to you. But remember, we are focusing on the meaning of a concept in a specific domain, not what it might mean in some other context, like in the context of a global standard. For example, your domain might define an email address to be case insensitive so anything the user enters will be transformed to lowercase. You could go even further and say that the only characters allowed are ASCII characters, digits, and dots ([a-z0-9.]). This is a deviation from the technical specification, but it’s a valid choice in the context of the current domain2.

Figure 1. The meaning of a term is defined within the boundaries of a given context

Sometimes you will encounter situations where the name of the concept you are trying to model is also used outside of the current context — and where its external definition is so prevalent that it would be confusing to redefine it in your domain model. An email address might be such a term, but as you just learned it can make sense to redefine the term "email" in your current domain. Another example of a well-defined term is an ISBN. The ISBN is defined by an International Organization for Standardization (ISO) standard, and redefining it could cause confusion, misinterpretation, and bugs. These types of subtle differences in meaning are a common cause for security issues, so you want to avoid them, especially when interacting with other systems or other domain contexts (figure 2).

Figure 2. Using an externally defined term without changing its meaning

A lot of times when you find yourself redefining a well-known term, the need for that redefinition is because the term is used to describe more than one thing in your current context. In those cases, try to either split the term into two distinct terms or come up with an entirely new term. This new term is unique to your current context, so you avoid any misinterpretation. It also makes it clear why certain specific invariants are used instead of those associated with the externally defined term. Another benefit of introducing a new term is that the original term can keep its crisp definition and remain a domain primitive. You’ve maintained full freedom to model important concepts in your domain without losing any of the model’s exactness.

Imagine you are building book-managing software that is using ISBNs to identify books. After a while you realize you need a way to identify and handle books that haven’t received an ISBN yet. One approach would be to redefine the term "ISBN" to not only represent real ISBN numbers but also to include internally assigned identifiers, perhaps using a magic prefix or something similar to distinguish them from the real ISBNs. To avoid the possible confusion that comes with redefining an ISO standard you could instead introduce a new term, BookId, that would contain either an ISBN or an UnpublishedBookNumber (figure 3). BookId is what identifies a book, and UnpublishedBookNumber is the internally assigned identifier.

Figure 3. Introducing new terms instead of redefining existing ones

By introducing two new terms, BookId and UnpublishedBookNumber, you are able to keep the exact and well-known definition of "ISBN" while at the same time meeting the needs of your business domain.

Building your domain primitive library

Now that you have expanded your toolbox with the versatility of domain primitives, you should strive to use them as much as you can in your code. They are the smallest building blocks and form the basis of your domain model. As such, almost every concept in your model, regardless of how small it is, will be based on one or more domain primitives. When you are done modeling you will have a collection of domain primitives that you can view as your domain primitive library. This library is not a collection of generic utility classes and methods, but rather a well-defined, ubiquitous set of domain concepts. And because they are domain primitives they are safe to pass around as arguments in your code just like regular value objects.

Domain primitives lower the cognitive load on developers because there’s no need to understand their inner workings in order to use them. You can safely use them with the confidence that they always represent valid values and well-defined concepts. If they’re not valid they won’t exist. This also removes the need to constantly revalidate data in order to make sure it’s safe to use. If it’s defined in your domain, you can trust it and use it freely.

Hardening your APIs by using your domain primitive library

You should always strive to use domain primitives in your programmatic APIs. If every argument and return value of a method is valid by definition, you will have input and output validation in every single method in your code base — without any extra effort. The way you are using domain design enables you to create code that is extremely resilient and robust. A positive side effect of this is that the number of security vulnerabilities caused by invalid input data will drastically decrease.

Let’s examine this more closely with a code example. Say you are given the task of sending the audit logs of your system to a central audit log repository. Audit logs contain sensitive data, and it’s important they are sent to a designated place to be stored and protected properly. Sending the data to the wrong place can have a significant negative business impact.
If you create a method in your API that takes the current audit logs and sends them to a log repository located at a given server address, it could end up looking like this:

void sendAuditLogsToServerAt(java.net.InetAddress internalIpAddress);

The issue here is that a method signature like this allows for any IP address to be the destination for the logs. If you fail to properly validate the address before sending the logs you can potentially send them to an insecure location and reveal sensitive data. If you instead define a domain primitive InternalIPAddress that strictly defines what an internal IP address is, you can have that be the type of the input parameter in your method. Applying this to the sendAuditLogsToServerAt method leads to the code in the following listing instead.

Listing 2. Hardening the API with domain primitives

void sendAuditLogsToServerAt(InternalAddress serverAddress) {     ①
   notNull(serverAddress);
   // Retrieve logs and send them to server
}
1 The only input validation left to perform is a null check

Now you have designed your method so it’s impossible to pass invalid input to it. The only form of validation left to do, in terms of verifying that the IP address is internal, is to make sure it’s not null.

Avoid exposing your domain publicly

One thing to remember when hardening your API is that if you have an API that acts as an interface to a different domain, you should avoid exposing your domain model’s objects in that API. If you do, you instantly make your domain model part of your public API3. As soon as other domains start using your API it quickly becomes very hard to change and evolve your domain independently.

An example of a public API facing a different domain is a REST API exposed on the internet for others to consume via client software. If you expose your internal domain in the REST API then you cannot evolve your domain without forcing the clients to evolve with you. If your business depends on those clients then you can’t ignore them and have no other option but to evolve at the same pace as your consumers can adapt their clients. To make things even worse, if you have multiple consumers then you’re not only tying yourself together with each consumer, but you’re tying the consumers together with each other. This is a less than ideal situation to be in, and you can avoid it by not exposing your domain publicly.

What you want to do instead is to use a different representation of each of your domain objects. This can be viewed as a type of data transfer object (DTO) used to communicate with other domains. You can place invariants in those DTOs, but those will not be the same constraints that exist in your domain. Rather, they can for example be constraints relevant to the communication protocol defined by the API. The first thing you do in an API method like this is to convert the DTO into the corresponding domain primitive(s) in order to ensure its data is valid.

By using this layer of translation between the concepts in your public API and your domain, you are able to uncouple the two. This allows you to evolve the API and your domain independently.

Let's finish up by reviewing the key points for domain primitives:
  • Their invariants are checked at the time of creation.
  • They can only exist if they are valid.
  • They should always be used instead of language primitives or generic types.
  • Their meaning is defined within the boundaries of the current domain, even if the same term exists outside of the current domain.
  • You should use your domain primitive library to create secure code.

Domain primitives is the foundation for creating secure code. If you get familiar with them and learn how to use them your code will become more robust, more concise, and have fewer bugs. It will become Secure by Design.

--------
[1] Explaining the definition of an email address is beyond the scope of this post, but a good place to start in trying to learn it is RFC 3696.
[2] It might be of interest to know that even RFC 5321 discourages the use of case-sensitive email addresses, although the specification defines email addresses as case sensitive.
[3] In Domain-Driven Design this type of shared domain is referred to as a shared kernel.

No comments:

Post a Comment