Chapter 8. Audit table partitioning

8.1. Benefits of audit table partitioning

Because audit tables tend to grow indefinitely they can quickly become really large. When the audit tables have grown to a certain limit (varying per RDBMS and/or operating system) it makes sense to start using table partitioning. SQL table partitioning offers a lot of advantages including, but certainly not limited to:

Improved query performance by selectively moving rows to various partitions (or even purging old rows)
Faster data loads, index creation, etc.

8.2. Suitable columns for audit table partitioning

Generally SQL tables must be partitioned on a column that exists within the table. As a rule it makes sense to use either the end revision or the end revision timestamp column for partioning of audit tables.

Note

End revision information is not available for the default AuditStrategy.

Therefore the following Envers configuration options are required:

org.hibernate.envers.audit_strategy = org.hibernate.envers.strategy.ValidityAuditStrategy

org.hibernate.envers.audit_strategy_validity_store_revend_timestamp = true

Optionally, you can also override the default values following properties:

org.hibernate.envers.audit_strategy_validity_end_rev_field_name

org.hibernate.envers.audit_strategy_validity_revend_timestamp_field_name

For more information, see Chapter 3, Configuration.

The reason why the end revision information should be used for audit table partioning is based on the assumption that audit tables should be partionioned on an 'increasing level of interestingness', like so:

A couple of partitions with audit data that is not very (or no longer) interesting. This can be stored on slow media, and perhaps even be purged eventually.
Some partitions for audit data that is potentially interesting.
One partition for audit data that is most likely to be interesting. This should be stored on the fastest media, both for reading and writing.

8.3. Audit table partitioning example

In order to determine a suitable column for the 'increasing level of interestingness', consider a simplified example of a salary registration for an unnamed agency.

Currently, the salary table contains the following rows for a certain person X:

Table 8.1. Salaries table

Year	Salary (USD)
2006	3300
2007	3500
2008	4000
2009	4500

The salary for the current fiscal year (2010) is unknown. The agency requires that all changes in registered salaries for a fiscal year are recorded (i.e. an audit trail). The rationale behind this is that decisions made at a certain date are based on the registered salary at that time. And at any time it must be possible reproduce the reason why a certain decision was made at a certain date.

The following audit information is available, sorted on in order of occurrence:

Table 8.2. Salaries - audit table

Year	Revision type	Revision timestamp	Salary (USD)	End revision timestamp
2006	ADD	2007-04-01	3300	null
2007	ADD	2008-04-01	35	2008-04-02
2007	MOD	2008-04-02	3500	null
2008	ADD	2009-04-01	3700	2009-07-01
2008	MOD	2009-07-01	4100	2010-02-01
2008	MOD	2010-02-01	4000	null
2009	ADD	2010-04-01	4500	null

8.3.1. Determining a suitable partitioning column

To partition this data, the 'level of interestingness' must be defined. Consider the following:

For fiscal year 2006 there is only one revision. It has the oldest revision timestamp of all audit rows, but should still be regarded as interesting because it is the latest modification for this fiscal year in the salary table; its end revision timestamp is null.
Also note that it would be very unfortunate if in 2011 there would be an update of the salary for fiscal year 2006 (which is possible in until at least 10 years after the fiscal year) and the audit information would have been moved to a slow disk (based on the age of the revision timestamp). Remember that in this case Envers will have to update the end revision timestamp of the most recent audit row.
There are two revisions in the salary of fiscal year 2007 which both have nearly the same revision timestamp and a different end revision timestamp. On first sight it is evident that the first revision was a mistake and probably uninteresting. The only interesting revision for 2007 is the one with end revision timestamp null.

Based on the above, it is evident that only the end revision timestamp is suitable for audit table partitioning. The revision timestamp is not suitable.

8.3.2. Determining a suitable partitioning scheme

A possible partitioning scheme for the salary table would be as follows:

end revision timestamp year = 2008
This partition contains audit data that is not very (or no longer) interesting.
end revision timestamp year = 2009
This partition contains audit data that is potentially interesting.
end revision timestamp year >= 2010 or null
This partition contains the most interesting audit data.

This partitioning scheme also covers the potential problem of the update of the end revision timestamp, which occurs if a row in the audited table is modified. Even though Envers will update the end revision timestamp of the audit row to the system date at the instant of modification, the audit row will remain in the same partition (the 'extension bucket').

And sometime in 2011, the last partition (or 'extension bucket') is split into two new partitions:

end revision timestamp year = 2010
This partition contains audit data that is potentially interesting (in 2011).
end revision timestamp year >= 2011 or null
This partition contains the most interesting audit data and is the new 'extension bucket'.