Entity Driven Database Setup

In order to be able to perform reliable DB-driven tests (be it performance related or not) we need to be able to reconstruct a database to the same state each time a test is run.

This is the precise goal of the dbunit project. If the life was ideal we could use it to export the data from a database and import it to another in a (more or less) db agnostic manner. It can use the foreign key relationships between tables in the database and create referentially complete subsets of data, which is exactly what we'd need for database testing.

But life is not ideal and this approach doesn't yield the optimal results. Because of various performance tweaks or maybe even oversights in the database design, the foreign key relationships aren't the ideal instrument to track the relationships of your applications business entities if for nothing else then for a complete lack of business logic awareness.

In case of RHQ for example, we'd almost always want to include the full configuration objects with the resources, subjects, alert senders, etc. But if we only instructed the dbUnit to export the tables necessary for a resource to be successfully created in the database, we wouldn't get the full config objects - just the entries in the RHQ_CONFIG table that is referenced from RHQ_RESOURCE table. The individual properties are tied to the RHQ_CONFIG also by a foreign key, but the RHQ_RESOURCE only references RHQ_CONFIG and thus dbUnit would not consider it essential to include the properties with the config (correctly). We could instruct dbUnit to follow both directions of the foreign key relationships but because of the above performance and other reasons, such export would almost always include the whole database, including RHQ_PLUGIN and the binary data within there.

So where to go else for a datamodel that we could work with, that could be translated to table terms and that has at least a hint of business logic awareness? Well, there's nothing else than the JPA entity model. Although this model obviously still doesn't contain much of the business logic (and therefore we'd still need some configuration to tell it what we want to include and what not), it contains much more information about what relationships are considered "important". In the above example, we can see that the Configuration entity directly references Property entities in a @OneToMany relationship and we could argue that if such explicit mapping exist then the properties are probably important for the configuration object.

Export/Import

In the perftest-support helper module (for now in perftest branch) there is code that implements the diagnostics of the JPA datamodel and that then translates the information from that analysis and from some additional configuration into terms understood by dbUnit to provide a meaningful export.

The main class is org.rhq.helpers.perftest.support.Main (I haven't gone round to wrapping the invocation of it in a nice script) and it supports following commandline arguments:

url - the JDBC url to the database
user - the user name to connect to a database
password - the password
driver-class - the java class of the JDBC driver to use (must be on the classpath)
config-file - the configuration file to drive the export (optional)
export - when specified, the tool will export the data
import - when specified the tool will import the data
replicate - the tool will try to replicate the data in the db using the provided config-file.
help - guess what...
file - the output file if exporting or input file when importing (optional, if not specified, standard output/input is used)
format - the format of the "file". "xml" uses the xml format and stores all the data in a single file. "csv" considers the "file" a directory and outputs/reads a series of csv files (one per table) from that directory.

The format of the configuration file is following:

<graph packagePrefix="org.rhq.core.domain" includeExplicitDependentsImplicitly="true\|false">
<entity name="resource.Resource" includeAllFields="false\|true" root="false\|true">
<filter>SELECT ID FROM RHQ_RESOURCE WHERE ...</filter>
<rel field="resourceConfiguration" exclude="false\|true"/>
...
</entity>
...
</graph>

graph
- packagePrefix is an optional attribute for shortening the names of the entities (which when prepended with packagePrefix form a full class name of the entity)
- includeExplicitDependentsImplicitly - defaults to true and specifies whether to implicitly follow the relationships defined from the "source" to "target" entity (i.e. the source entity declares a field with a relationship to the target entity and the source entity is considered the "source" of that relationship (i.e. the "one" in @OneToMany for example).
entity
- name - the class name of the entity to include (either full class name or one that would complete one when prepended with packagePrefix)
- includeAllFields - whether to include all fields of the entity (defaults to false)
- root - the entity is the "root" of the export - i.e. only root entities are passed down to the exporter as the ones to be exported. All other entities mentioned in the configuration are there to further restrict the export for entities that are determined as either dependent on one of the root entities or that the root entities depend on (recursively).
- filter - optional SQL statement that will return a set of primary keys to export
- rel - the specification of the relationship to include
  - field - the field specifying the relationship on the entity
  - exclude - this enables for "negative" inclusion configuration. If you specify includeAllFields as true, you can exclude individual relationship using this attribute.

The tool analyzes the dependency graph of the entities and based on it and the configuration and outputs all the data from the database that was a) configured to be included by the configuration file and b) and other data that is needed for the referential integrity.

Testing

In the code of the above tool, there is a couple of classes to integrate the funtionality with TestNG.

Here's what you need to do in your tests to add support for database setup:

add @Listeners({org.rhq.helpers.perftest.support.testng.DatabaseSetupInterceptor.class }) to your test class
the DatabaseSetupInterceptor is now hardcoded to lookup the database connection from the java:/RHQDS datasource.
annotate your test method/class with @DatabaseState annotation with these attributes
1. url - the url specifying the location of the export file from the tool above (assumes the xml format)
2. storage - CLASSLOADER or FILESYSTEM - where the url is locatable from

Replication

Warning

This is a work in progress. Parts of the replication workflow are not implemented yet and the whole system as described is still subject to change.

To be able to test multi-agent setups, we need to have data ready for each of the agents in the database. That is of course possible without any kind of replication of the data by simply having the exported data set contain all the data for all the agents. This approach, though sufficient, has two main drawbacks:

difficulties in preparing such dataset - the dataset preparation is a manual process and the developer is going to face a very boring and error-prone job when preparing the data for 50 agents.
the dataset remains "static" - once the dataset is created, there is no way of changing the number of "replicas". What if the developer under- or overestimates the performance of the subsystem s/he's testing and would like to have a different number of agents to test with? What if the performance improves and the tests will no longer stress the system enough? The developer would have to go through the boring exercise of creating the dataset manually again with each such change.

The chosen approach is therefore to define the dataset in a single agent environment and then have a tool the is going to be able to replicate parts of the dataset as needed.
To understand the requirements on the replication capabilities, let's start describing them from top, i.e. from the TestNG annotations.

/**
 * Specifies how to replicate the test data.
 * This annotation is only processed as a part of the {@link DatabaseState}
 * 
 * @author Lukas Krejci
 */
public @interface DataReplication {

    /**
     * The path to the replication configuration. 
     */
    String url();

    /**
     * Where does the {@link #url()} point to. 
     */
    FileStorage storage() default FileStorage.CLASSLOADER;
    
    /**
     * How many replicas should be prepared and how they should be distributed
     * among the test invocations. The default is a replica per invocation.
     */
    ReplicaCreationStrategy replicaCreationStrategy() default ReplicaCreationStrategy.PER_INVOCATION;
        
    /**
     * This callback can be used to modify the replica before it is persisted to the database.
     * This can be used to modify the "names", "descriptions" and other data that is not significant
     * to the referential integrity but that can help identifying the entities.
     * <p>
     * The method must have the following signature:<br/>
     * <code>
     * void &lt;method-name&gt;(int replicaNumber, Object original, Object replica, Class&lt;?&gt; entityType)
     * </code> 
     */
    String replicaModifier() default "";
}

The @DataReplication annotation can be specified as part of the @DatabaseState annotation on the test.
The annotation has the following properties:

url - this is where to find the replication configuration. The replication configuration is in fact identical in format to the export configuration file described above and basically even means the same thing. The only difference between the export and replication (from the point of view of the configuration file) is where the data to be exported/replicated ends up. In case of export the data is exported into a file, in case of replication a copy of the data is stored to the same database the data was read from.
storage - where to find the configuration file. This is either in the classloader or on filesystem.
replicaModifier - this is an optional callback method that is called before a replica is stored into the database so that it can be modified by the caller in some fashion (think updating the name so that it's easier for the developer to identify the replicas...)
replicaCreationStrategy - TestNG can run a test several times either concurrently in a number of threads or sequentially. This can be specified in the TestNG's @Test annotation. The replica creation strategy then defines how many replicas are going to be created and are the replicas going to be distributed to the individual test invocations. There are two possible strategies:
- PER_INVOCATION - each invocation of the test gets it own replica. This is the default strategy.
- PER_THREAD - all the tests that happen to be executed in given thread obtain the same replica.

But how do the tests obtain the replicas, you ask? It would be ideal if the tests could have a method parameter that would contain the replica data. Unfortunately, I found no way of achieving that in TestNG (I found no way of persuading TestNG that this method is a test method even if it has parameter and doesn't have a data provider). The alternative approach is therefore to use a kind of singleton that would be able to identify the test being executed and supply it with the correct replica.
Such singleton is surprisingly called ReplicaProvider and has a static get() method that, when called from within the test, will automagically return an object describing the replicas supposed for that test.