Create Data Models

2021.08.08

Design Input And Output Data

We used to design the overall data model of the application, i.e. the entities in the database, which we refer to as a domain model. But since we have to implement independent features and components, we should do more:

  • Carefully design the data flow within every feature.

  • Clearly define the input and output data for each component.

Data Transfer Objects (DTOs)

For this, we should create data transfer objects. These are data classes that only carry data and do not contain procedures implementing business logic.

Here we benefit from the method that we separate procedural and data classes. See more in that chapter.

Data Model

Every data structure we create is a data model. Its strength is that it models the data. The better we can describe the data, the clearer it tells what should be processed.

Imagine that you have to create complex Excel files programmatically. The files may contain not only rows and columns but also multiple sheets, sub-tables, row groups, column groups, and other complex structures. You should model the Excel content with data structures as precisely as possible.

Use Composition

Of course, data structures are made up of other data structures and data classes, down to single attributes. How should we build them efficiently?

Do Not Copy

If some data structures are already collected, whose attributes we need, we should not create new classes with the demanded attributes, and copy their values. Instead, we should simply add the already collected data as a whole to our new data objects. (Of course, only if there is no objection against it, like memory consumption or others.)

I would simply call it embedding a data structure into another one.

With embedding, we can reach, let's say, attribute9 with the following expression:

data4.getData1().getData3().getAttribute9()

We should avoid copying attributes because it requires more procedural code and possibilities for mistakes. With composition we have the following benefits:

  • It is simpler to put together the new data structures.

  • There is no need to change the existing data. (It should be treated as immutable, see below.)

Do Not Inherit

For the same reason as in the case of copying, inheritance is not useful either. With this, the attributes are inherited only on type level, but on object level (in runtime) we still need to copy them.

Alternatively, if we declare the first data structure already for the new, extended data type, then we have to fill the extra attributes later, in a second step. This contradicts our goal to treat the data as immutable. (See more in the next chapter.)

We should not use inheritance in data objects anyway, as described in this chapter: Do Not Use Inheritance, Rules For Data Classes.

Embed Entities

You can even embed database entities if the memory consumption does not speak against it.

It is recommended that they are detached from the database, in the terms of Hibernate. This simply means that we use the entities outside of the original reading transaction. In this way, they simply become 'data transfer objects' and we can add them to other DTOs.

Treat Data As Immutable

This has key importance to solve the problem that is described in Separate Data Collection And Processing. We want to clearly separate the writing and the reading of all data to make our code clean.

Actually, the data should be immutable, so that it cannot be modified during the processing. When the processing generates more data, then it should be stored into other data objects, designed for the output.

Create all data once, via constructors—and factory methods—and don't change them after that.

This also comes in handy when we use our components in functional programming.

inputs.stream()
    .map(Processor1::process)
    .map(Processor2::process)
    .map(Processor3::process)
    .collect(toList());

Use Records

In Java 16 the record keyword has been introduced. (From Java 14 as a preview feature.) Records implement everything, which is written above, and we don't even have to declare a class for them in the old way!

  • Records can be created simply from instances of data objects or other records. No need for class declaration.

  • Accessors ('getters') will be automatically generated, and they do nothing else but return a property value. They cannot be overridden either.

  • Records have no mutators ('setters') at all. If records are composed of other records then the result will be close to read-only.

  • The class does not support extension and inheritance, it is final too.

Read more here, or find tutorials on the internet:

Put Together What Belongs Together

When collecting the data that should be processed do not simply create collections. If those collections contain data related to each other then create a DTO that holds them together.

Let's say every A has a B and multiple C-s.

Collection<A> as;
Collection<B> bs;
Collection<C> cs;

In other words, you should finish the preparation of the input data before the processing of the data.

Avoid Maps

The same goes for Maps.

Maps are inherently unfinished data structures. Despite having all objects mapped to their keys, the processing code must complete the mapping by getting an object by the key. And if we need the mapped object in multiple code parts then it must do the same mapping again and again, which is code repetition.

Collection<A> as;
Map<A, B> bs;

void process1(A a, Map<A, B> bs) {
    B b = bs.get(a);
} 

void process2(A a, Map<A, B> bs) {
    B b = bs.get(a); // code repetition
} 

Use Factories

If certain data can be created in different ways, then use factory classes and methods to create them, instead of using multiple constructors. This is important for more reasons:

Unlike constructors, factory methods have names. There is no clean code without names. Names describe the business logic, the program implements.

The other reason is obvious from the article Separate Data And Procedures. For the data creation, we may need specific procedures and other components, including the database. These procedures and dependencies cannot be added to the DTO classes. Data classes should only carry the data and should not contain procedures.

A simple way to create a factory class for every DTO. If the data creation consists of more components then the classes should be in a separate package, according to the Single Responsibility Principle.

package data;

class User { ... }
class UserFactory { ... }

class Contract { ... }
class ContractFactory { ... }

If more classes are needed to create certain DTOs, place them together with the factory classes and use the package private (default) visibility. (See the example above.)

When using factory methods, use only one constructor for the DTO that requires all attributes.

When using records as DTOs (see above), we don't need to explicitly declare data classes. But in this case, we can still keep the factory classes.

Advantage In Testing

Creating clear input and output data for the components completely changes the unit testing, making it easier and more clear.

Instead of mocking the used components, we should simply create the data classes that serve as input. The same goes for the expected output if that's a well-defined data structure too. We should create the expected output and compare them with the actual one.

We can also implement the equals() methods of every class, so that we can simply test the classes for equality. Or, we can provide comparators if the testing needs a different comparison than the runtime equality. With records we will get the equality based on the data content, so the runtime comparison won't differ from the testing.

public void testSomething() {
    Offer offer = new Offer(
                           new User(...),
                           new Template(...));
                        
    Contract countractActual = contractService.createContract(offer);
    
    contractExpected = new Contract(
                           new Contractor(...),
                           new ContactPerson(...));
    
    Assert.assertEquals(contractExpected, countractActual);
}

Too many data classes?

Will we not have too many DTOs in the way as written above? In other words, won't we have a "DTO hell"?

Yes, we may have many new classes. Factories can also increase the number of classes. Good news is that using records can decrease it.

But there is one important point because we would like to write clean code. We don't have to consider all DTOs together as a "big bunch" of classes. We should organize the code by business logic, we should separate features.

Do not create a global dto or model package to collect all data classes from different features.

So for a certain modification of the code, we need to focus only on one feature's classes, which are independent of the others.

Last updated