Asynchronous Options - Infinispan 5.1

Introduction

When Infinispan instances are clustered, regardless of the clustering mode, data can be propagated to other nodes in a synchronous or asynchronous way. When synchronous, the sender waits for replies from the receivers and when asynchronous, the sender sends the data and does not wait for replies from other nodes in the cluster.

With asynchronous modes, speed is more important than consistency and this is particularly advantageous in use cases such as HTTP session replication with sticky sessions enabled. In these scenarios, data, or in this case a particular session, is always accessed on the same cluster node and only in case of failure is data accessed in a different node. This type of architectures allow consistency to be relaxed in favour of increased performance.

In order to choose the asynchronous configuration that best suits your application, it's important to understand the following configuration settings:

Asynchronous Communications

Whenever you add <async> element within <clustering>, you're telling the underlying JGroups layer in Infinispan to use asynchronous communication. What this means is that JGroups will send any replication/distribution/invalidation request to the wire but will not wait for a reply from the receiver.

Asynchronous Marshalling

This is a configurable boolean property of <async> element that indicates whether the actual call from Infinispan to the JGroups layer is done on a separate thread or not. When set to true, once Infinispan has figured out that a request needs to be sent to another node, it submits it to the async transport executor so that it can talk to the underlying JGroups layer.

With asynchronous marshalling, Infinispan requests can return back to the client quicker compared to when async marshalling is set to false. The downside though is that client requests can be reordered before they have reached the JGroups layer. In other words, JGroups provides ordering guarantees even for async messages but with async marshalling turned on, requests can reach the JGroups in a different order in which they're called. This can effectively lead to data consistency issues in applications making multiple modifications on the same key/value pair. For example, with async marshalling turned on:

App calls:

cache.put("car", "bmw");
cache.remove("car");

Other nodes could receive these operations in this order:

cache.remove("car");
cache.put("car", "bmw");

The end result is clearly different which is often not desirable. So, if your application makes multiple modifications on the same key, you should either: turned off asynchronous marshalling, or set <asyncTransportExecutor> element's maxThreads to 1. The first modification only applies to a particular named cache, whereas the second option affects all named caches in configuration file that are configured with async marshalling. It's worth noting though that having this type of executor configured with a single thread would defeat its purpose adding unnecessary contention point. It'd be better to simply switch off async marshalling.

On the contrary, if your application only ever makes one modification per key/value pair and there's no happens-before relationship between them, then async marshalling is a very valid optimization that can increase performance of your application without data consistency risks.

If you have async marshalling turned on and see exceptions related to java.util.concurrent.RejectedExecutionException, as explained in the technical faq page, you should also consider switching off async marshalling.

Back in Infinispan 4.0, when <async> element was used, this property was set to true by default. However due to reordering risks mentioned earlier, the default has changed to false from Infinispan 4.1 onwards.

Replication Queue

The aim of the replication queue is to batch the individual cache operations and send them as one, as opposed to sending each cache operation individually. As a result, replication queue enabled configurations perform generally better compared to those that have it switched off because less RPC messages are sent, fewer envelopes are used...etc. The only real trade off to the replication queue is that the queue is flushed periodically (based on time or queue size) and hence it might take longer for the replication/distribution/invalidation to be realised across the cluster. When replication queue is turned off, data is placed directly on the wire and hence it takes less for data to arrive to other nodes.

Until Infinispan 4.1.0.CR2, replication queue always flushed data with async marshalling turned on, which meant that there was a small gap where flush calls could be reordered. Since 4.1.0.CR3, async marshalling configuration is taken into account, and decides whether the flush calls goes directly to the JGroups layer, or whether an intermediate handing over to a different thread occurs. The advantages of using async marshalling with replication queue are less than clear because replication queue itself already makes client requests return faster, so it's generally recommended to have async marshalling turned off, or <asyncTransportExecutor> element's maxThreads set to 1, when replication queue is turned on.

Asynchronous API

Finally, the asynchronous API can be used to emulate non-blocking APIs, whereby calls are handed over to a different thread and asynchronous API calls return to the client immediately. Similar to async marshalling, using this API can lead to reordering, so you should avoid calling modifying asynchronous methods on the same keys.

Return Values

Regardless of the asynchronous option used, the return values of cache operations are reliable. If talking about return values of cache operations that return previous value, the correctness of these returns are guaranteed as well regardless of the clustering mode. With replication, the previous value is already available locally, and with distribution, regardless of whether it's asynchronous or synchronous, Infinispan sends a synchronous request to get the previous value if not present locally. If on the other hand the asynchronous API is used, client code needs to get hold of the NotifiyngFuture returned by the async operation in order to be able to query the previous value.