Fetching

Fetching, essentially, is the process of grabbing data from the database and making it available to the application. Tuning how an application does fetching is one of the biggest factors in determining how an application will perform. Fetching too much data, in terms of width (values/columns) and/or depth (results/rows), adds unnecessary overhead in terms of both JDBC communication and ResultSet processing. Fetching too little data might cause additional fetching to be needed. Tuning how an application fetches data presents a great opportunity to influence the application overall performance.

The basics

The concept of fetching breaks down into two different questions.

  • When should the data be fetched? Now? Later?

  • How should the data be fetched?

"now" is generally termed eager or immediate. "later" is generally termed lazy or delayed.

There are a number of scopes for defining fetching:

static

Static definition of fetching strategies is done in the mappings. The statically-defined fetch strategies is used in the absence of any dynamically defined strategies

SELECT

Performs a separate SQL select to load the data. This can either be EAGER (the second select is issued immediately) or LAZY (the second select is delayed until the data is needed). This is the strategy generally termed N+1.

JOIN

Inherently an EAGER style of fetching. The data to be fetched is obtained through the use of an SQL outer join.

BATCH

Performs a separate SQL select to load a number of related data items using an IN-restriction as part of the SQL WHERE-clause based on a batch size. Again, this can either be EAGER (the second select is issued immediately) or LAZY (the second select is delayed until the data is needed).

SUBSELECT

Performs a separate SQL select to load associated data based on the SQL restriction used to load the owner. Again, this can either be EAGER (the second select is issued immediately) or LAZY (the second select is delayed until the data is needed).

dynamic (sometimes referred to as runtime)

Dynamic definition is really use-case centric. There are multiple ways to define dynamic fetching:

fetch profiles

defined in mappings, but can be enabled/disabled on the Session.

HQL/JPQL

and both Hibernate and JPA Criteria queries have the ability to specify fetching, specific to said query.

entity graphs

Starting in Hibernate 4.2 (JPA 2.1) this is also an option.

Applying fetch strategies

Let’s consider these topics as it relates to an simple domain model and a few use cases.

Example 1. Sample domain model
@Entity(name = "Department")
public static class Department {

    @Id
    private Long id;

    @OneToMany(mappedBy = "department")
    private List<Employee> employees = new ArrayList<>();

    //Getters and setters omitted for brevity
}

@Entity(name = "Employee")
public static class Employee {

    @Id
    private Long id;

    @NaturalId
    private String username;

    @Column(name = "pswd")
    @ColumnTransformer(
        read = "decrypt( 'AES', '00', pswd  )",
        write = "encrypt('AES', '00', ?)"
    )
    private String password;

    private int accessLevel;

    @ManyToOne(fetch = FetchType.LAZY)
    private Department department;

    @ManyToMany(mappedBy = "employees")
    private List<Project> projects = new ArrayList<>();

    //Getters and setters omitted for brevity
}

@Entity(name = "Project")
public class Project {

    @Id
    private Long id;

    @ManyToMany
    private List<Employee> employees = new ArrayList<>();

    //Getters and setters omitted for brevity
}

The Hibernate recommendation is to statically mark all associations lazy and to use dynamic fetching strategies for eagerness. This is unfortunately at odds with the JPA specification which defines that all one-to-one and many-to-one associations should be eagerly fetched by default. Hibernate, as a JPA provider, honors that default.

No fetching

For the first use case, consider the application login process for an Employee. Let’s assume that login only requires access to the Employee information, not Project nor Department information.

Example 2. No fetching example
Employee employee = entityManager.createQuery(
    "select e " +
    "from Employee e " +
    "where " +
    "    e.username = :username and " +
    "    e.password = :password",
    Employee.class)
.setParameter( "username", username)
.setParameter( "password", password)
.getSingleResult();

In this example, the application gets the Employee data. However, because all associations from `Employee `are declared as LAZY (JPA defines the default for collections as LAZY) no other data is fetched.

If the login process does not need access to the Employee information specifically, another fetching optimization here would be to limit the width of the query results.

Example 3. No fetching (scalar) example
Integer accessLevel = entityManager.createQuery(
    "select e.accessLevel " +
    "from Employee e " +
    "where " +
    "    e.username = :username and " +
    "    e.password = :password",
    Integer.class)
.setParameter( "username", username)
.setParameter( "password", password)
.getSingleResult();

Dynamic fetching via queries

For the second use case, consider a screen displaying the Projects for an Employee. Certainly access to the Employee `is needed, as is the collection of `Projects for that Employee. Information about Departments, other Employees or other Projects is not needed.

Example 4. Dynamic JPQL fetching example
Employee employee = entityManager.createQuery(
    "select e " +
    "from Employee e " +
    "left join fetch e.projects " +
    "where " +
    "    e.username = :username and " +
    "    e.password = :password",
    Employee.class)
.setParameter( "username", username)
.setParameter( "password", password)
.getSingleResult();
Example 5. Dynamic query fetching example
CriteriaBuilder builder = entityManager.getCriteriaBuilder();
CriteriaQuery<Employee> query = builder.createQuery( Employee.class );
Root<Employee> root = query.from( Employee.class );
root.fetch( "projects", JoinType.LEFT);
query.select(root).where(
    builder.and(
        builder.equal(root.get("username"), username),
        builder.equal(root.get("password"), password)
    )
);
Employee employee = entityManager.createQuery( query ).getSingleResult();

In this example we have an Employee `and their `Projects loaded in a single query shown both as an HQL query and a JPA Criteria query. In both cases, this resolves to exactly one database query to get all that information.

Dynamic fetching via JPA entity graph

JPA 2.1 introduced entity graphs so the application developer has more control over fetch plans.

Example 6. Fetch graph example
@Entity(name = "Employee")
@NamedEntityGraph(name = "employee.projects",
    attributeNodes = @NamedAttributeNode("projects")
)
Employee employee = entityManager.find(
    Employee.class,
    userId,
    Collections.singletonMap(
        "javax.persistence.fetchgraph",
        entityManager.getEntityGraph( "employee.projects" )
    )
);

Entity graphs are the way to override the EAGER fetching associations at runtime. With JPQL, if an EAGER association is omitted, Hibernate will issue a secondary select for every association needed to be fetched eagerly.

Dynamic fetching via Hibernate profiles

Suppose we wanted to leverage loading by natural-id to obtain the Employee information in the "projects for and employee" use-case. Loading by natural-id uses the statically defined fetching strategies, but does not expose a means to define load-specific fetching. So we would leverage a fetch profile.

Example 7. Fetch profile example
@Entity(name = "Employee")
@FetchProfile(
    name = "employee.projects",
    fetchOverrides = {
        @FetchProfile.FetchOverride(
            entity = Employee.class,
            association = "projects",
            mode = FetchMode.JOIN
        )
    }
)
session.enableFetchProfile( "employee.projects" );
Employee employee = session.bySimpleNaturalId( Employee.class ).load( username );

Here the Employee is obtained by natural-id lookup and the Employee’s Project data is fetched eagerly. If the Employee data is resolved from cache, the Project data is resolved on its own. However, if the Employee data is not resolved in cache, the Employee and Project data is resolved in one SQL query via join as we saw above.