Chapter 7. Core Engine: Persistence and transactions

JBoss.orgCommunity Documentation

Chapter 7. Core Engine: Persistence and transactions

7.1. Runtime State

7.1.1. Binary Persistence
7.1.2. Safe Points
7.1.3. Configuring Persistence
7.1.4. Transactions

7.2. Process Definitions

7.3. History Log

7.3.1. Storing Process Events in a Database

jBPM allows the persistent storage of certain information, i.e., the process runtime state (to be able to continue execution of a process instance at any point, for example if something goes wrong), the process definitions themselves, the history information (a log of current state and what happened already), etc. This chapter describes these different types of persistence, and how to configure them.

7.1. Runtime State

Whenever a process is started, a process instance is created, which represents the execution of the process in that specific context. For example, when executing a process that specifies how to process a sales order, one process instance is created for each sales request. The process instance represents the current execution state in that specific context, and contains all the information related to that process instance. Note that it only contains the (minimal) runtime state that is needed to continue the execution of that process instance at some later time, but it does not include information about the history of that process instance if that information is no longer needed in the process instance.

The runtime state of an executing process can be made persistent, for example, in a database. This allows to restore the state of execution of all running processes in case of unexpected failure, or to temporarily remove running instances from memory and restore them at some later time. jBPM allows you to plug in different persistence strategies. By default, if you do not configure the process engine otherwise, process instances are not made persistent.

If you configure the engine to use persistence, it will automatically store the runtime state into the database. You do not have to trigger persistence yourself, the engine will take care of this when persistence is enabled. Whenever you invoke the engine, it will make sure that any changes are stored at the end of that invocation, at so-called safe points. Whenever something goes wrong and you restore the engine from the database, you also should not reload the process instances and trigger them manually to resume execution, as process instances will automatically resume execution if they are triggered, like for example by a timer expiring, the completion of a task that was requested by that process instance, or a signal being sent to the process instance. The engine will automatically reload process instances on demand.

The runtime persistence data should in general be considered internal, meaning that you probably should not try to access these database tables directly and especially not try to modify these directly (as changing the runtime state of process instances without the engine knowing might have unexpected side-effects). In most cases where information about the current execution state of process instances is required, the use of a history log is mostly recommended (see below). In some cases, it might still be useful to for example query the internal database tables directly, but you should only do this if you know what you are doing.

7.1.1. Binary Persistence

jBPM provides a binary persistence mechanism that allows you to save the state of a process instance as a binary dataset. This way, the state of all running process instances can always be stored in a persistent location. Note that these binary datasets usually are relatively small, as they only contain the minimal execution state of the process instance. For a simple process instance, this usually contains one or a few node instances, i.e., any node that is currently executing, and, possibly, some variable values. The process instance is transformed into a binary blob (for performance reasons, using a custom serialization mechanism, not using normal Java serialization). This blob is then stored, alongside other metadata about this process instance (like for example the process instance id, process id, start date, etc.).

The database schema if you store this information into a database looks like this:

Apart from the process instance state, the session itself can also store some state, like for example timers, or if you are using business rules the session data that the rules are evaluated over. This session state is stored separately as a binary blob, along with the id of the session and some metadata. You always restore session state by reloading the session with the given id. The session id can be retrieved using ksession.getId().

7.1.2. Safe Points

The state of a process instance is stored at so-called "safe points" during the execution of the process engine. Whenever a process instance is executing (for example when it started or continuing from a previous wait state, the engine executes the process instance until no more actions can be performed (meaning that the process instance either has completed (or was aborted), or that it has reached a wait state in all of its parallel paths). At that point, the engine has reached the next safe state, and the state of the process instance (and all other process instances that might have been affected) is stored persistently.

7.1.3. Configuring Persistence

By default, the engine does not save runtime data persistently. This means you can use the engine completely without persistence (so not even requiring an in memory database) if necessary, for example for performance reasons, or when you would like to manage persistence yourself. It is, however, possible to configure the engine to do use persistence by configuring it to do so. This usually requires adding the necessary dependencies, configuring a datasource and creating the engine with persistence configured.

7.1.3.1. Adding dependencies

You need to make sure the necessary dependencies are available in the classpath of your application if you want to user persistence. By default, persistence is based on the Java Persistence API (JPA) and can thus work with several persistence mechanisms. We are using Hibernate by default.

If you're using the Eclipse IDE and the jBPM Eclipse plugin, you should make sure the necessary jars are added to your jBPM runtime directory. You don't really need to do anything (as the necessary dependencies should already be there) if you are using the jBPM runtime that is configured by default when using the jBPM installer, or if you downloaded and unzipped the jBPM runtime artefact (from the downloads) and pointed the jBPM plugin to that directory.

If you would like to manually add the necessary dependencies to your project, first of all, you need the jar file jbpm-persistence-jpa.jar, as that contains code for saving the runtime state whenever necessary. Next, you also need various other dependencies, depending on the persistence solution and database you are using. For the default combination with Hibernate as the JPA persistence provider and using an H2 in-memory database and Bitronix for JTA-based transaction management, the following list of additional dependencies is needed:

jbpm-test (org.jbpm)
jbpm-persistence-jpa (org.jbpm)
drools-persistence-jpa (org.drools)
persistence-api (javax.persistence)
hibernate-entitymanager (org.hibernate)
hibernate-annotations (org.hibernate)
hibernate-commons-annotations (org.hibernate)
hibernate-core (org.hibernate)
commons-collections (commons-collections)
dom4j (dom4j)
jta (javax.transaction)
btm (org.codehaus.btm)
javassist (javassist)
slf4j-api (org.slf4j)
slf4j-jdk14 (org.slf4j)
h2 (com.h2database)

7.1.3.2. Configuring the engine to use persistence using `JBPMHelper`

You need to configure the jBPM engine to use persistence, usually simply by using the appropriate constructor when creating your session. There are various ways to create a session (as we have tried to make this as easy as possible for you and have several utility classes for you, depending for example if you are trying to write a process junit test).

The easiest way to do this is to use the jbpm-test module that allows you to easily create and test your processes. The JBPMHelper class has a method to create a session, and uses a configuration file to configure this session, like whether you want to use persistence, the datasource to use, etc. The helper class will then do all the setup and configuration for you.

To configure persistence, create a jBPM.properties file and configure the following properties (note that the example below are the default properties, using an H2 in-memory database with persistence enables, if you are fine with all of these properties, you don't need to add new properties file, as it will then use these properties by default):

# for creating a datasource
persistence.datasource.name=jdbc/jbpm-ds
persistence.datasource.user=sa
persistence.datasource.password=
persistence.datasource.url=jdbc:h2:tcp://localhost/~/jbpm-db
persistence.datasource.driverClassName=org.h2.Driver

# for configuring persistence of the session
persistence.enabled=true
persistence.persistenceunit.name=org.jbpm.persistence.jpa
persistence.persistenceunit.dialect=org.hibernate.dialect.H2Dialect

# for configuring the human task service
taskservice.enabled=true
taskservice.datasource.name=org.jbpm.task
taskservice.transport=mina
taskservice.usergroupcallback=org.jbpm.task.service.DefaultUserGroupCallbackImpl

If you want to use persistence, you must make sure that the datasource (that you specified in the jBPM.properties file) is initialized correctly. This means that the database itself must be up and running, and the datasource should be registered using the correct name. If you would like to use an H2 in-memory database (which is usually very easy to do some testing), you can use the JBPMHelper class to start up this database, using:

JBPMHelper.startH2Server();

To register the datasource (this is something you always need to do, even if you're not using H2 as your database, check below for more options on how to configure your datasource), use:

JBPMHelper.setupDataSource();

Next, you can use the JBPMHelper class to create your session (after creating your knowledge base, which is identical to the case when you are not using persistence):

StatefulKnowledgeSession ksession = JBPMHelper.newStatefulKnowledgeSession(kbase);

Once you have done that, you can just call methods on this ksession (like startProcess) and the engine will persist all runtime state in the created datasource.

You can also use the JBPMHelper class to recreate your session (by restoring its state from the database, by passing in the session id (that you can retrieve using ksession.getId())):

StatefulKnowledgeSession ksession = 

    JBPMHelper.loadStatefulKnowledgeSession(kbase, sessionId);

7.1.3.3. Manually configuring the engine to use persistence

You can also use the JPAKnowledgeService to create your knowledge session. This is slightly more complex, but gives you full access to the underlying configurations. You can create a new knowledge session using JPAKnowledgeService based on a knowledge base, a knowledge session configuration (if necessary) and an environment. The environment needs to contain a reference to your Entity Manager Factory. For example:

// create the entity manager factory and register it in the environment

EntityManagerFactory emf =

    Persistence.createEntityManagerFactory( "org.jbpm.persistence.jpa" );

Environment env = KnowledgeBaseFactory.newEnvironment();

env.set( EnvironmentName.ENTITY_MANAGER_FACTORY, emf );


// create a new knowledge session that uses JPA to store the runtime state

StatefulKnowledgeSession ksession =

    JPAKnowledgeService.newStatefulKnowledgeSession( kbase, null, env );

int sessionId = ksession.getId();


// invoke methods on your method here

ksession.startProcess( "MyProcess" );

ksession.dispose();

You can also use the JPAKnowledgeService to recreate a session based on a specific session id:



// recreate the session from database using the sessionId

ksession = JPAKnowledgeService.loadStatefulKnowledgeSession(

    sessionId, kbase, null, env );

You need to add a persistence configuration to your classpath to configure JPA to use Hibernate and the H2 database (or your own preference), called persistence.xml in the META-INF directory, as shown below. For more details on how to change this for your own configuration, we refer to the JPA and Hibernate documentation for more information.


<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<persistence

  version="1.0"

  xsi:schemaLocation=

    "http://java.sun.com/xml/ns/persistence

     http://java.sun.com/xml/ns/persistence/persistence_1_0.xsd

     http://java.sun.com/xml/ns/persistence/orm

     http://java.sun.com/xml/ns/persistence/orm_1_0.xsd"

  xmlns:orm="http://java.sun.com/xml/ns/persistence/orm"

  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

  xmlns="http://java.sun.com/xml/ns/persistence">



  <persistence-unit name="org.jbpm.persistence.jpa" transaction-type="JTA">

    <provider>org.hibernate.ejb.HibernatePersistence</provider>

    <jta-data-source>jdbc/jbpm-ds</jta-data-source>

    <mapping-file>META-INF/JBPMorm.xml</mapping-file>

    <class>org.drools.persistence.info.SessionInfo</class>

    <class>org.jbpm.persistence.processinstance.ProcessInstanceInfo</class>

    <class>org.drools.persistence.info.WorkItemInfo</class>



    <properties>

      <property name="hibernate.dialect" value="org.hibernate.dialect.H2Dialect"/>

      <property name="hibernate.max_fetch_depth" value="3"/>

      <property name="hibernate.hbm2ddl.auto" value="update"/>

      <property name="hibernate.show_sql" value="true"/>

      <property name="hibernate.transaction.manager_lookup_class"

                value="org.hibernate.transaction.BTMTransactionManagerLookup"/>

    </properties>

  </persistence-unit>

</persistence>

This configuration file refers to a data source called "jdbc/jbpm-ds". If you run your application in an application server (like for example JBoss AS), these containers typically allow you to easily set up data sources using some configuration (like for example dropping a datasource configuration file in the deploy directory). Please refer to your application server documentation to know how to do this.

For example, if you're deploying to JBoss Application Server v5.x, you can create a datasource by dropping a configuration file in the deploy directory, for example:


<?xml version="1.0" encoding="UTF-8"?>

<datasources>

  <local-tx-datasource>

    <jndi-name>jdbc/jbpm-ds</jndi-name>

    <connection-url>jdbc:h2:tcp://localhost/~/test</connection-url>

    <driver-class>org.h2.jdbcx.JdbcDataSource</driver-class>

    <user-name>sa</user-name>

    <password></password>

  </local-tx-datasource>

</datasources>

If you are however executing in a simple Java environment, you can use the JBPMHelper class to do this for you (see above) or the following code fragment could be used to set up a data source (where we are using the H2 in-memory database in combination with Bitronix in this case).

PoolingDataSource ds = new PoolingDataSource();

ds.setUniqueName("jdbc/jbpm-ds");

ds.setClassName("bitronix.tm.resource.jdbc.lrc.LrcXADataSource");

ds.setMaxPoolSize(3);

ds.setAllowLocalTransactions(true);

ds.getDriverProperties().put("user", "sa");

ds.getDriverProperties().put("password", "sasa");

ds.getDriverProperties().put("URL", "jdbc:h2:tcp://localhost/~/jbpm-db");

ds.getDriverProperties().put("driverClassName", "org.h2.Driver");

ds.init();

7.1.4. Transactions

Whenever you do not provide transaction boundaries inside your application, the engine will automatically execute each method invocation on the engine in a separate transaction. If this behavior is acceptable, you don't need to do anything else. You can, however, also specify the transaction boundaries yourself. This allows you, for example, to combine multiple commands into one transaction.

You need to register a transaction manager at the environment before using user-defined transactions. The following sample code uses the Bitronix transaction manager. Next, we use the Java Transaction API (JTA) to specify transaction boundaries, as shown below:

// create the entity manager factory and register it in the environment

EntityManagerFactory emf =

    Persistence.createEntityManagerFactory( "org.jbpm.persistence.jpa" );

Environment env = KnowledgeBaseFactory.newEnvironment();

env.set( EnvironmentName.ENTITY_MANAGER_FACTORY, emf );

env.set( EnvironmentName.TRANSACTION_MANAGER,

         TransactionManagerServices.getTransactionManager() );


// create a new knowledge session that uses JPA to store the runtime state

StatefulKnowledgeSession ksession =

    JPAKnowledgeService.newStatefulKnowledgeSession( kbase, null, env );


// start the transaction

UserTransaction ut =

  (UserTransaction) new InitialContext().lookup( "java:comp/UserTransaction" );

ut.begin();


// perform multiple commands inside one transaction

ksession.insert( new Person( "John Doe" ) );

ksession.startProcess( "MyProcess" );


// commit the transaction

ut.commit();

Note that, if you use Bitronix as the transaction manager, you should also add a simple jndi.properties file in you root classpath to register the Bitronix transaction manager in JNDI. If you are using the jbpm-test module, this is already included by default. If not, create a file named jndi.properties with the following content:

java.naming.factory.initial=bitronix.tm.jndi.BitronixInitialContextFactory

If you would like to use a different JTA transaction manager, you can change the persistence.xml file to use your own transaction manager. For example, when running inside JBoss Application Server v5.x, you can use the JBoss transaction manager. You need to change the transaction manager property in persistence.xml to:

<property name="hibernate.transaction.manager_lookup_class"
             value="org.hibernate.transaction.JBossTransactionManagerLookup" />

7.2. Process Definitions

Process definition files are usually written in an XML format. These files can easily be stored on a file system during development. However, whenever you want to make your knowledge accessible to one or more engines in production, we recommend using a knowledge repository that (logically) centralizes your knowledge in one or more knowledge repositories.

Guvnor is a Drools sub-project that provides exactly that. It consists of a repository for storing different kinds of knowledge, not only process definitions but also rules, object models, etc. It allows easy retrieval of this knowledge using WebDAV or by employing a knowledge agent that automatically downloads the information from Guvnor when creating a knowledge base, and provides a web application that allows business users to view and possibly update the information in the knowledge repository. Check out the Drools Guvnor documentation for more information on how to do this.

7.3. History Log

In many cases it is useful (if not necessary) to store information about the execution of process instances, so that this information can be used afterwards, for example, to verify what actions have been executed for a particular process instance, or to monitor and analyze the efficiency of a particular process. Storing history information in the runtime database is usually not a good idea, as this would result in ever-growing runtime data, and monitoring and analysis queries might influence the performance of your runtime engine. That is why history information about the execution of process instances is stored separately.

This history log of execution information is created based on the events generated by the process engine during execution. The jBPM runtime engine provides a generic mechanism to listen to different kinds of events. The necessary information can easily be extracted from these events and made persistent, for example in a database. Filters can be used to only store the information you find relevant.

7.3.1. Storing Process Events in a Database

The jbpm-bam module contains an event listener that stores process-related information in a database using JPA or Hibernate directly. The database contains two tables, one for process instance information and one for node instance information (see the figure below):

ProcessInstanceLog: This lists the process instance id, the process (definition) id, the start date and (if applicable) the end date of all process instances.
NodeInstanceLog: This table contains more detailed information about which nodes were actually executed inside each process instance. Whenever a node instance is entered from one of its incomming connections or is exited through one of its outgoing connections, that information is stored in this table. For this, it stores the process instance id and the process id of the process instance it is being executed in, and the node instance id and the corresponding node id (in the process definition) of the node instance in question. Finally, the type of event (0 = enter, 1 = exit) and the date of the event is stored as well.

To log process history information in a database like this, you need to register the logger on your session (or working memory) like this:



StatefulKnowledgeSession ksession = ...;

JPAWorkingMemoryDbLogger logger = new JPAWorkingMemoryDbLogger(ksession);


// invoke methods one your session here


logger.dispose();

Note that this logger is like any other audit logger, which means that you can add one or more filters by calling the method addFilter to ensure that only relevant information is stored in the database. Only information accepted by all your filters will appear in the database. You should dispose the logger when it is no longer needed.

To specify the database where the information should be stored, modify the file persistence.xml file to include the audit log classes as well (ProcessInstanceLog, NodeInstanceLog and VariableInstanceLog), as shown below.


<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<persistence

  version="1.0"

  xsi:schemaLocation=

    "http://java.sun.com/xml/ns/persistence

     http://java.sun.com/xml/ns/persistence/persistence_1_0.xsd

     http://java.sun.com/xml/ns/persistence/orm

     http://java.sun.com/xml/ns/persistence/orm_1_0.xsd"

  xmlns:orm="http://java.sun.com/xml/ns/persistence/orm"

  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

  xmlns="http://java.sun.com/xml/ns/persistence">



  <persistence-unit name="org.jbpm.persistence.jpa">

    <provider>org.hibernate.ejb.HibernatePersistence</provider>

    <jta-data-source>jdbc/processInstanceDS</jta-data-source>

    <class>org.drools.persistence.info.SessionInfo</class>

    <class>org.jbpm.persistence.processinstance.ProcessInstanceInfo</class>

    <class>org.drools.persistence.info.WorkItemInfo</class>

    <class>org.jbpm.process.audit.ProcessInstanceLog</class>

    <class>org.jbpm.process.audit.NodeInstanceLog</class>

    <class>org.jbpm.process.audit.VariableInstanceLog</class>



    <properties>

      <property name="hibernate.dialect" value="org.hibernate.dialect.H2Dialect"/>

      <property name="hibernate.max_fetch_depth" value="3"/>

      <property name="hibernate.hbm2ddl.auto" value="update"/>

      <property name="hibernate.show_sql" value="true"/>

      <property name="hibernate.transaction.manager_lookup_class"

                value="org.hibernate.transaction.BTMTransactionManagerLookup"/>

    </properties>

  </persistence-unit>

</persistence>

All this information can easily be queried and used in a lot of different use cases, ranging from creating a history log for one specific process instance to analyzing the performance of all instances of a specific process.

This audit log should only be considered a default implementation. We don't know what information you need to store for analysis afterwards, and for performance reasons it is recommended to only store the relevant data. Depending on your use cases, you might define your own data model for storing the information you need, and use the process event listeners to extract that information.