Clear Code
  • Introduction
    • About This Book
    • Timeline
    • Software Killers
    • Billion Dollar Mistakes
    • Clear Code Overview
    • Clear Code Rules
  • Data Processing
    • Separate Data Collection And Processing
    • Create Data Models
    • Separate Use Cases
    • Data Should Be Immutable
  • Examples
    • Separate Use Cases With Data Model
  • Maintenance Cost
    • Consider the Maintenance Cost
    • The Software Exists In Time
    • Don't Feed the Monsters
  • OOP
    • Separate Data And Procedures
    • Do Not Use Inheritance
    • When To Avoid Inheritance?
    • What Is The Problem With Abstract Frameworks?
  • VARIOUS
    • The Real Single Responsibility Principle
    • The problem with Clean Code's name
    • How To Handle Warnings
    • Do Not Create Constant Collection Classes
  • Externals
    • Links
    • Quotes
    • Funny
  • Technology
    • Git Tutorials
  • Clean Code
    • Clean Code Introduction
      • Origin & Overview
      • Advanced
      • Typical Issues
    • Clean Code Outline
      • Why Clean Code?
      • Clean Code
      • Clean Code Approaches
      • Specification & Design
      • Duplication
      • Refinement & Refactoring
      • Conventions
      • Names
      • Types
      • Methods
      • Nulls and Validity Checks
      • Comments
      • Dead Code
      • Error Handling
      • Classes
      • Code Formatting
      • Unit Tests
      • Special Cases
      • Object Oriented Programming
      • General Code Smells
    • Clean Code Links
    • Clean Code TOC
    • Effective Java TOC
Powered by GitBook
On this page
  • Design Input And Output Data
  • Data Transfer Objects (DTOs)
  • Data Model
  • Use Composition
  • Do Not Copy
  • Do Not Inherit
  • Embed Entities
  • Treat Data As Immutable
  • Use Records
  • Put Together What Belongs Together
  • Avoid Maps
  • Use Factories
  • Advantage In Testing
  • Too many data classes?

Was this helpful?

  1. Data Processing

Create Data Models

2021.08.08

Last updated 1 year ago

Was this helpful?

Design Input And Output Data

We used to design the overall data model of the application, i.e. the entities in the database, which we refer to as a domain model. But since we have to implement independent features and components, we should do more:

  • Carefully design the data flow within every feature.

  • Clearly define the input and output data for each component.

Data Transfer Objects (DTOs)

For this, we should create data transfer objects. These are data classes that only carry data and do not contain procedures implementing business logic.

Here we benefit from the method that we . See more in that chapter.

Data Model

Every data structure we create is a data model. Its strength is that it models the data. The better we can describe the data, the clearer it tells what should be processed.

Imagine that you have to create complex Excel files programmatically. The files may contain not only rows and columns but also multiple sheets, sub-tables, row groups, column groups, and other complex structures. You should model the Excel content with data structures as precisely as possible.

Use Composition

Of course, data structures are made up of other data structures and data classes, down to single attributes. How should we build them efficiently?

Do Not Copy

If some data structures are already collected, whose attributes we need, we should not create new classes with the demanded attributes, and copy their values. Instead, we should simply add the already collected data as a whole to our new data objects. (Of course, only if there is no objection against it, like memory consumption or others.)

I would simply call it embedding a data structure into another one.

With embedding, we can reach, let's say, attribute9 with the following expression:

data4.getData1().getData3().getAttribute9()

We should avoid copying attributes because it requires more procedural code and possibilities for mistakes. With composition we have the following benefits:

  • It is simpler to put together the new data structures.

  • There is no need to change the existing data. (It should be treated as immutable, see below.)

Do Not Inherit

For the same reason as in the case of copying, inheritance is not useful either. With this, the attributes are inherited only on type level, but on object level (in runtime) we still need to copy them.

Alternatively, if we declare the first data structure already for the new, extended data type, then we have to fill the extra attributes later, in a second step. This contradicts our goal to treat the data as immutable. (See more in the next chapter.)

Embed Entities

You can even embed database entities if the memory consumption does not speak against it.

It is recommended that they are detached from the database, in the terms of Hibernate. This simply means that we use the entities outside of the original reading transaction. In this way, they simply become 'data transfer objects' and we can add them to other DTOs.

Treat Data As Immutable

Actually, the data should be immutable, so that it cannot be modified during the processing. When the processing generates more data, then it should be stored into other data objects, designed for the output.

Create all data once, via constructors—and factory methods—and don't change them after that.

This also comes in handy when we use our components in functional programming.

inputs.stream()
    .map(Processor1::process)
    .map(Processor2::process)
    .map(Processor3::process)
    .collect(toList());

Use Records

In Java 16 the record keyword has been introduced. (From Java 14 as a preview feature.) Records implement everything, which is written above, and we don't even have to declare a class for them in the old way!

  • Records can be created simply from instances of data objects or other records. No need for class declaration.

  • Accessors ('getters') will be automatically generated, and they do nothing else but return a property value. They cannot be overridden either.

  • Records have no mutators ('setters') at all. If records are composed of other records then the result will be close to read-only.

  • The class does not support extension and inheritance, it is final too.

Read more here, or find tutorials on the internet:

Put Together What Belongs Together

When collecting the data that should be processed do not simply create collections. If those collections contain data related to each other then create a DTO that holds them together.

Let's say every A has a B and multiple C-s.

Collection<A> as;
Collection<B> bs;
Collection<C> cs;
class ADto {
    A a;
    B b;              // b  belonging to a
    Collection<C> cs; // cs belonging to a
}

Collection<ADto> as;

In other words, you should finish the preparation of the input data before the processing of the data.

Avoid Maps

The same goes for Maps.

Maps are inherently unfinished data structures. Despite having all objects mapped to their keys, the processing code must complete the mapping by getting an object by the key. And if we need the mapped object in multiple code parts then it must do the same mapping again and again, which is code repetition.

Collection<A> as;
Map<A, B> bs;

void process1(A a, Map<A, B> bs) {
    B b = bs.get(a);
} 

void process2(A a, Map<A, B> bs) {
    B b = bs.get(a); // code repetition
} 
// Data model

class ADto {
    A a;
    B b; // b belonging to a
}

Collection<ADto> as;

// Data creation

ADto createADto(A a, Map<A, B> bs) {
    return new ADto {
        a;
        B b = bs.get(a); // do it only once
    }
}

// Data processing

void process1(A a) {
    B b = a.getB();
} 

void process2(A a) {
    B b = a.getB();
}

Use Factories

If certain data can be created in different ways, then use factory classes and methods to create them, instead of using multiple constructors. This is important for more reasons:

Unlike constructors, factory methods have names. There is no clean code without names. Names describe the business logic, the program implements.

A simple way to create a factory class for every DTO. If the data creation consists of more components then the classes should be in a separate package, according to the Single Responsibility Principle.

package data;

class User { ... }
class UserFactory { ... }

class Contract { ... }
class ContractFactory { ... }
package data;

public class User { ... }
public class UserFactory { ... }

package data.contract;

public class Contract { ... }
public class ContractFactory { ... }
class ContractIdGenerator { ... }    // note the visibility
class ContractValidator { ... }      // note the visibility

If more classes are needed to create certain DTOs, place them together with the factory classes and use the package private (default) visibility. (See the example above.)

When using factory methods, use only one constructor for the DTO that requires all attributes.

When using records as DTOs (see above), we don't need to explicitly declare data classes. But in this case, we can still keep the factory classes.

Advantage In Testing

Creating clear input and output data for the components completely changes the unit testing, making it easier and more clear.

Instead of mocking the used components, we should simply create the data classes that serve as input. The same goes for the expected output if that's a well-defined data structure too. We should create the expected output and compare them with the actual one.

We can also implement the equals() methods of every class, so that we can simply test the classes for equality. Or, we can provide comparators if the testing needs a different comparison than the runtime equality. With records we will get the equality based on the data content, so the runtime comparison won't differ from the testing.

public void testSomething() {
    Offer offer = new Offer(
                           new User(...),
                           new Template(...));
                        
    Contract countractActual = contractService.createContract(offer);
    
    contractExpected = new Contract(
                           new Contractor(...),
                           new ContactPerson(...));
    
    Assert.assertEquals(contractExpected, countractActual);
}

Too many data classes?

Will we not have too many DTOs in the way as written above? In other words, won't we have a "DTO hell"?

Yes, we may have many new classes. Factories can also increase the number of classes. Good news is that using records can decrease it.

But there is one important point because we would like to write clean code. We don't have to consider all DTOs together as a "big bunch" of classes. We should organize the code by business logic, we should separate features.

Do not create a global dto or model package to collect all data classes from different features.

So for a certain modification of the code, we need to focus only on one feature's classes, which are independent of the others.

We should not use inheritance in data objects anyway, as described in this chapter: .

This has key importance to solve the problem that is described in . We want to clearly separate the writing and the reading of all data to make our code clean.

The other reason is obvious from the article . For the data creation, we may need specific procedures and other components, including the database. These procedures and dependencies cannot be added to the DTO classes. Data classes should only carry the data and should not contain procedures.

separate procedural and data classes
Separate Data Collection And Processing
Java 14 – Record data class
Java 14 – JEP 359: Records (Preview)
Java 15 – JEP 384: Records (Second Preview)
Java 16 – JEP 395: Records
Separate Data And Procedures
Do Not Use Inheritance, Rules For Data Classes