JBoss.orgCommunity Documentation
The failure recovery subsystem of JBossTS ensure that results of a transaction are applied
consistently to all
resources affected by the transaction, even if any of the application
processes or the hardware hosting them crash
or lose network connectivity. In the case of
hardware crashes or network failures, the recovery does not take place
until the system or
network are restored, but the original application does not need to be restarted. Recovery is
handled by the Recovery Manager process. For recover to take place, information about the
transaction and the
resources involved needs to survive the failure and be accessible afterward.
This information is held in the
ActionStore
, which is part of the
ObjectStore
. If the
ObjectStore
is destroyed or modified, recovery may not be possible.
Until the recovery procedures are complete, resources affected by a transaction which was in progress at the time of the failure may be inaccessible. Database resources may report this as as tables or rows held by in-doubt transactions . For TXOJ resources, an attempt to activate the Transactional Object, such as when trying to get a lock, fails.
Although some ORB-specific configuration is necessary to configure the ORB sub-system, the
basic settings are ORB-independent.
The configuration which applies to JBossTS is in the
RecoveryManager-properties.xml
file and
the
orportability-properties.xml
file. Contents of each file are below.
Example 7.1. RecoverManager-properties.xml
<entry key="RecoveryEnvironmentBean.recoveryActivatorClassNames">
com.arjuna.ats.internal.jts.orbspecific.recovery.RecoveryEnablement
</entry>
Example 7.2. orportability-properties.xml
<entry key="com.arjuna.orbportability.orb.PostInit2">com.arjuna.ats.internal.jts.recovery.RecoveryInit</entry>
These entries cause instances of the named classes to be loaded. The named classes then load
the ORB-specific
classes needed and perform other initialization. This enables failure recovery
for transactions initiated by or
involving applications using this property file. The default
RecoveryManager-properties.xml
file and
orportability-properties.xml
with the distribution include these entries.
Failure recovery is NOT supported with the JavaIDL ORB that is part of JDK. Failure recovery is supported for JacOrb only.
To disable recovery, remove or comment out the
RecoveryEnablement
line in the property file.
Recovery of XA resources accessed via JDBC is handled by the
XARecoveryModule
. This
module includes both
transaction-initiated
and
resource-initiated
recovery.
Transaction-initiated recovery is possible where the particular transaction branch
progressed far enough for
a
JTA_ResourceRecord
to be written in the ObjectStore. The record contains the
information needed to link the
transaction to information known by the rest of JBossTS in the database.
Resource-initiated recovery is necessary for branches where a failure occurred after the
database made a
persistent record of the transaction, but before the
JTA_ResourceRecord
was
written. Resource-initiated recovery is also necessary for datasources for which it
is impossible to hold
information in the
JTA_ResourceRecord
that allows the recreation in the
RecoveryManager of the
XAConnection
or
XAResource
used in the
original application.
Transaction-initiated recovery is automatic. The
XARecoveryModule
finds the
JTA_ResourceRecord
which needs recovery, using the two-pass mechanism described
above. It then uses the normal
recovery mechanisms to find the status of the transaction the resource was
involved in, by
running
replay_completion
on the
RecoveryCoordinator
for the transaction branch. Next, it creates or recreates the
appropriate
XAResource
and issues
commit
or
rollback
on it as appropriate. The
XAResource
creation uses the
same database name, username, password, and other information as the
application.
Resource-initiated recovery must be specifically configured, by supplying the
RecoveryManager
with the appropriate information for it to interrogate all the
XADataSources
accessed by any JBossTS application. The access to each
XADataSource
is handled by a class that implements the
com.arjuna.ats.jta.recovery.XAResourceRecovery
interface. Instances of this class
are dynamically loaded, as controlled by property
JTAEnvironmentBean.xaResourceRecoveryInstances
.
The
XARecoveryModule
uses the
XAResourceRecovery
implementation to
get an
XAResource
to the target datasource. On each invocation of
periodicWorkSecondPass
, the recovery module issues an
XAResource.recover
request. This request returns a list of the transaction identifiers
that are known to the
datasource and are in an in-doubt state. The list of these in-doubt Xids is compared
across
multiple passes, using
periodicWorkSecondPass-es
. Any Xid that appears in both
lists, and for which no
JTA_ResourceRecord
is found by the intervening
transaction-initiated recovery, is assumed to belong to a
transaction involved in a crash before any
JTA_Resource_Record
was written, and a
rollback
is issued for
that transaction on the
XAResource
.
This double-scan mechanism is used because it is possible the Xid was obtained from the datasource just as the original application process was about to create the corresponding JTA_ResourceRecord. The interval between the scans should allow time for the record to be written unless the application crashes (and if it does, rollback is the right answer).
An
XAResourceRecovery
implementation class can contain all the information needed to
perform recovery to a specific
datasource. Alternatively, a single class can handle multiple datasources which
have some
similar features. The constructor of the implementation class must have an empty parameter
list,
because it is loaded dynamically. The interface includes an
initialise
method, which
passes in further information as a
string
. The content of the string is taken from the property
value that provides the class name.
Everything after the first semi-colon is passed as the value of the
string. The
XAResourceRecovery
implementation class determines how to use the string.
An
XAResourceRecovery
implementation class,
com.arjuna.ats.internal.jdbc.recovery.BasicXARecovery
, supports resource-initiated recovery for any XADataSource. For this class, the string
received in method
initialise
is assumed to contain the number of connections to recover, and the name of the
properties
file containing the dynamic class name, the database username, the database password and the
database
connection URL. The following example is for an Oracle 8.1.6 database accessed via
the Sequelink 5.1 driver:
XAConnectionRecoveryEmpay=com.arjuna.ats.internal.jdbc.recovery.BasicXARecovery;2;OraRecoveryInfo
This implementation is only meant as an example, because it relies upon usernames and
passwords appearing in
plain text properties files. You can create your own implementations
of
XAConnectionRecovery
. See the javadocs and the example
com.arjuna.ats.internal.jdbc.recovery.BasicXARecovery
.
Example 7.3. XAConnectionRecovery implementation
/*
* Copyright (C) 2000, 2001,
*
* Hewlett-Packard,
* Arjuna Labs,
* Newcastle upon Tyne,
* Tyne and Wear,
* UK.
*
*/
package com.arjuna.ats.internal.jdbc.recovery;
import com.arjuna.ats.jdbc.TransactionalDriver;
import com.arjuna.ats.jdbc.common.jdbcPropertyManager;
import com.arjuna.ats.jdbc.logging.jdbcLogger;
import com.arjuna.ats.internal.jdbc.*;
import com.arjuna.ats.jta.recovery.XAConnectionRecovery;
import com.arjuna.ats.arjuna.common.*;
import com.arjuna.common.util.logging.*;
import java.sql.*;
import javax.sql.*;
import javax.transaction.*;
import javax.transaction.xa.*;
import java.util.*;
import java.lang.NumberFormatException;
/**
* This class implements the XAConnectionRecovery interface for XAResources.
* The parameter supplied in setParameters can contain arbitrary information
* necessary to initialise the class once created. In this instance it contains
* the name of the property file in which the db connection information is
* specified, as well as the number of connections that this file contains
* information on (separated by ;).
*
* IMPORTANT: this is only an *example* of the sorts of things an
* XAConnectionRecovery implementor could do. This implementation uses
* a property file which is assumed to contain sufficient information to
* recreate connections used during the normal run of an application so that
* we can perform recovery on them. It is not recommended that information such
* as user name and password appear in such a raw text format as it opens up
* a potential security hole.
*
* The db parameters specified in the property file are assumed to be
* in the format:
*
* DB_x_DatabaseURL=
* DB_x_DatabaseUser=
* DB_x_DatabasePassword=
* DB_x_DatabaseDynamicClass=
*
* DB_JNDI_x_DatabaseURL=
* DB_JNDI_x_DatabaseUser=
* DB_JNDI_x_DatabasePassword=
*
* where x is the number of the connection information.
*
* @since JTS 2.1.
*/
public class BasicXARecovery implements XAConnectionRecovery
{
/*
* Some XAConnectionRecovery implementations will do their startup work
* here, and then do little or nothing in setDetails. Since this one needs
* to know dynamic class name, the constructor does nothing.
*/
public BasicXARecovery () throws SQLException
{
numberOfConnections = 1;
connectionIndex = 0;
props = null;
}
/**
* The recovery module will have chopped off this class name already.
* The parameter should specify a property file from which the url,
* user name, password, etc. can be read.
*/
public boolean initialise (String parameter) throws SQLException
{
int breakPosition = parameter.indexOf(BREAKCHARACTER);
String fileName = parameter;
if (breakPosition != -1)
{
fileName = parameter.substring(0, breakPosition -1);
try
{
numberOfConnections = Integer.parseInt(parameter.substring(breakPosition +1));
}
catch (NumberFormatException e)
{
//Produce a Warning Message
return false;
}
}
PropertyManager.addPropertiesFile(fileName);
try
{
PropertyManager.loadProperties(true);
props = PropertyManager.getProperties();
}
catch (Exception e)
{
//Produce a Warning Message
return false;
}
return true;
}
public synchronized XAConnection getConnection () throws SQLException
{
JDBC2RecoveryConnection conn = null;
if (hasMoreConnections())
{
connectionIndex++;
conn = getStandardConnection();
if (conn == null)
conn = getJNDIConnection();
if (conn == null)
//Produce a Warning message
}
return conn;
}
public synchronized boolean hasMoreConnections ()
{
if (connectionIndex == numberOfConnections)
return false;
else
return true;
}
private final JDBC2RecoveryConnection getStandardConnection () throws SQLException
{
String number = new String(""+connectionIndex);
String url = new String(dbTag+number+urlTag);
String password = new String(dbTag+number+passwordTag);
String user = new String(dbTag+number+userTag);
String dynamicClass = new String(dbTag+number+dynamicClassTag);
Properties dbProperties = new Properties();
String theUser = props.getProperty(user);
String thePassword = props.getProperty(password);
if (theUser != null)
{
dbProperties.put(ArjunaJDBC2Driver.userName, theUser);
dbProperties.put(ArjunaJDBC2Driver.password, thePassword);
String dc = props.getProperty(dynamicClass);
if (dc != null)
dbProperties.put(ArjunaJDBC2Driver.dynamicClass, dc);
return new JDBC2RecoveryConnection(url, dbProperties);
}
else
return null;
}
private final JDBC2RecoveryConnection getJNDIConnection () throws SQLException
{
String number = new String(""+connectionIndex);
String url = new String(dbTag+jndiTag+number+urlTag);
String password = new String(dbTag+jndiTag+number+passwordTag);
String user = new String(dbTag+jndiTag+number+userTag);
Properties dbProperties = new Properties();
String theUser = props.getProperty(user);
String thePassword = props.getProperty(password);
if (theUser != null)
{
dbProperties.put(ArjunaJDBC2Driver.userName, theUser);
dbProperties.put(ArjunaJDBC2Driver.password, thePassword);
return new JDBC2RecoveryConnection(url, dbProperties);
}
else
return null;
}
private int numberOfConnections;
private int connectionIndex;
private Properties props;
private static final String dbTag = "DB_";
private static final String urlTag = "_DatabaseURL";
private static final String passwordTag = "_DatabasePassword";
private static final String userTag = "_DatabaseUser";
private static final String dynamicClassTag = "_DatabaseDynamicClass";
private static final String jndiTag = "JNDI_";
/*
* Example:
*
* DB2_DatabaseURL=jdbc\:arjuna\:sequelink\://qa02\:20001
* DB2_DatabaseUser=tester2
* DB2_DatabasePassword=tester
* DB2_DatabaseDynamicClass=
* com.arjuna.ats.internal.jdbc.drivers.sequelink_5_1
*
* DB_JNDI_DatabaseURL=jdbc\:arjuna\:jndi
* DB_JNDI_DatabaseUser=tester1
* DB_JNDI_DatabasePassword=tester
* DB_JNDI_DatabaseName=empay
* DB_JNDI_Host=qa02
* DB_JNDI_Port=20000
*/
private static final char BREAKCHARACTER = ';'; // delimiter for parameters
}
XAResource.recover
returns the list of all transactions that are in-doubt with in the
datasource. If multiple
recovery domains are used with a single datasource, resource-initiated recovery sees
transactions from other domains. Since it does not have a
JTA_ResourceRecord
available, it rolls back the transaction in the database, if the Xid appears in successive
recover calls. To
suppress resource-initiated recovery, do not supply an
XAConnectionRecovery
property, or
confine it to one recovery domain.
Property
OTS_ISSUE_RECOVERY_ROLLBACK
controls whether the
RecoveryManager
explicitly issues a rollback request when
replay_completion
asks for the status of a transaction that is unknown. According to
the
presume-abort
mechanism used by OTS and JTS, the transaction can be assumed to have
rolled back, and this
is the response that is returned to the
Resource
, including a
subordinate coordinator, in this case. The
Resource
should then apply that result to the
underlying resources. However, it is also legitimate for
the superior to issue a rollback, if
OTS_ISSUE_RECOVERY_ROLLBACK
is set to
YES
.
The OTS transaction identification mechanism makes it possible for a transaction coordinator
to hold a
Resource
reference that will never be usable. This can occur in two cases:
The process holding the
Resource
crashes before receiving the commit or rollback
request from the coordinator.
The
Resource
receives the commit or rollback, and responds. However, the message is
lost or the
coordinator process has crashed.
In the first case, the
RecoveryManager
for the
Resource
ObjectStore
eventually reconstructs a new
Resource
(with a
different CORBA object reference (IOR), and issues a
replay_completion
request
containing the new
Resource
IOR. The
RecoveryManager
for the
coordinator substitutes this in place of the original, useless one, and issues
commit
to the new reconstructed
Resource
. The
Resource
has to have been
in a commit state, or there would be no transaction intention list. Until
the
replay_completion
is received, the
RecoveryManager
tries to send
commit
to its
Resource
reference.–This will fail with a CORBA
System Exception. Which exception depends on the ORB
and other details.
In the second case, the
Resource
no longer exists. The
RecoveryManager
at the coordinator will never get through, and will receive System
Exceptions forever.
The
RecoveryManager
cannot distinguish these two cases by any protocol mechanism. There
is a perceptible cost in
repeatedly attempting to send the commit to an inaccessible
Resource
. In particular, the timeouts involved will extend the recovery iteration time,
and thus
potentially leave resources inaccessible for longer.
To avoid this, the
RecoveryManager
only attempts to send
commit
to a
Resource
a limited number of times. After that, it considers the transaction
assumed complete
. It retains the information about the transaction, by changing the object type
in the
ActionStore
, and if the
Resource
eventually does wake up
and a
replay_completion
request is received, the
RecoveryManager
activates the transaction and issues the commit request to the new Resource IOR. The number
of times the
RecoveryManager
attempts to issue
commit
as part of the periodic
recovery is controlled by the property variable
COMMITTED_TRANSACTION_RETRY_LIMIT
, and
defaults to
3
.
The operation of the recovery subsystem causes some entries to be made in the
ObjectStore
that are not removed in normal progress. The
RecoveryManager
has a facility for scanning
for these and removing items that are very old. Scans and
removals are performed by implementations of the
>com.arjuna.ats.arjuna.recovery.ExpiryScanner
. Implementations of this interface
are loaded by giving the class names as the value of the
property
RecoveryEnvironmentBean.expiryScannerClassNames
. The
RecoveryManager
calls the
scan
method on each loaded
ExpiryScanner
implementation at an
interval determined by the property
RecoveryEnvironmentBean.expiryScanInterval
. This value is
given in hours, and defaults to
12
. A property value of
0
disables any
expiry scanning. If the value as supplied is positive, the first scan is
performed when
RecoveryManager
starts. If the value is negative, the first scan is delayed until after
the first interval,
using the absolute value.
There are two kinds of item that are scanned for expiry:
Contact items |
One contact item is created by every application process that uses JBossTS. They
contain the
information that the
|
Assumed complete transactions |
The expiry time is counted from when the transactions were assumed to be complete.
A
|
Example 7.4. ExpiryScanner properties
<entry key="RecoveryEnvironmentBean.expiryScannerClassNames">
com.arjuna.ats.internal.arjuna.recovery.ExpiredTransactionStatusManagerScanner
com.arjuna.ats.internal.jts.recovery.contact.ExpiredContactScanner
com.arjuna.ats.internal.jts.recovery.transactions.ExpiredToplevelScanner
com.arjuna.ats.internal.jts.recovery.transactions.ExpiredServerScanner
</entry>
There are two
ExpiryScannner
s for the assumed complete transactions, because there are
different types in the
ActionStore.
A key part of the recovery subsystem is that the
RecoveryManager
hosts the OTS
RecoveryCoordinator
objects that handle recovery for transactions initiated in
application processes. Information
passes between the application process and the
RecoveryManager
in one of three ways:
RecoveryCoordinator
object references (IORs) are created in the application
process. They contain information
identifying the transaction in the object key. They pass the object key to
the
Resource
objects, and the
RecoveryManager
receives it.
The application process and the
RecoveryManager
access the same
jbossts-properties.xml
, and therefore use the same
ObjectStore
.
The
RecoveryCoordinator
invokes CORBA directly in the application, using information
in the contact items.
Contact items are kept in the
ObjectStore
.
Multiple recovery domains may useful if you are doing a migration, and separate
ObjectStores
are useful. However, multiple RecoveryManagers can cause problems with XA
datasources if
resource-initiated recovery is active on any of them.
When a transaction successfully commits, the transaction log is removed from the system. The
log is no longer
required, since all registered Resources have responded successfully to the
two-phase commit sequence. However, if
a
Resource
calls
replay_completion
on the
RecoveryCoordinator
after the transaction it represents commits, the status returned is
StatusRolledback
. The transaction system does not keep a record of committed transactions,
and assumes that in
the absence of a transaction log, the transaction must have rolled back. This is in line with
the
presumed abort protocol
used by the OTS.