Hibernate.orgCommunity Documentation
Currently Hibernate OGM supports the following datastores:
More are planned, if you are interested, come talk to us (see Chapter 1, How to get help and contribute on Hibernate OGM).
Hibernate OGM interacts with NoSQL datastores via two contracts:
The main thing you need to do is to configure which datastore provider you want to use.
This is done via the hibernate.ogm.datastore.provider
option.
Possible values are the fully qualified class name
of a DatastoreProvider
implementation
or one preferably of the following shortcuts:
map
: stores data in an in-memory Java map to store data.
Use it only for unit tests.infinispan
: stores data into Infinispan (data grid)ehcache
: stores data into Ehcache (cache)mongodb
: stores data into MongoDB (document store)neo4j
: stores data into Neo4j (graph)You also need to add the relevant Hibernate OGM module in your classpath. In maven that would look like:
<dependency>
<groupId>org.hibernate.ogm</groupId>
<artifactId>hibernate-ogm-infinispan</artifactId>
<version>4.0.0.Beta4</version>
</dependency>
We have respectively
hibernate-ogm-infinispan
, hibernate-ogm-ehcache
, hibernate-ogm-mongodb
and hibernate-ogm-neo4j
.
The map datastore is included in the Hibernate OGM engine module.
By default, a datastore provider chooses the best grid dialect transparently
but you can manually override that setting
with the hibernate.ogm.datastore.grid_dialect
option.
Use the fully qualified class name of the GridDialect
implementation.
Most users should ignore this setting entirely and live happy.
Infinispan is an open source in-memory data grid focusing on high performance. As a data grid, you can deploy it on multiple servers - referred to as nodes - and connect to it as if it were a single storage engine: it will cleverly distribute both the computation effort and the data storage.
It is trivial to setup on a single node, in your local JVM, so you can easily try Hibernate OGM. But Infinispan really shines in multiple node deployments: you will need to configure some networking details but nothing changes in terms of application behaviour, while performance and data size can scale linearly.
From all its features we’ll only describe those relevant to Hibernate OGM; for a complete description of all its capabilities and configuration options, refer to the Infinispan project documentation at infinispan.org.
Two steps basically:
And then choose one of:
JNDI
name of an existing Infinispan instanceTo add the dependencies via some Maven-definitions-using tool, add the following module:
<dependency>
<groupId>org.hibernate.ogm</groupId>
<artifactId>hibernate-ogm-infinispan</artifactId>
<version>4.0.0.Beta4</version>
</dependency>
If you’re not using a dependency management tool, copy all the dependencies from the distribution in the directories:
/lib/required
/lib/infinispan
/lib/provided
The advanced configuration details of an Infinispan Cache are defined in an Infinispan specific XML configuration file; the Hibernate OGM properties are simple and usually just point to this external resource.
To use the default configuration provided by Hibernate OGM - which is a good starting point for new users - you don’t have to set any property.
Infinispan datastore configuration properties
hibernate.ogm.datastore.provider
infinispan
.hibernate.ogm.infinispan.cachemanager_jndiname
EmbeddedCacheManager
registered in JNDI,
provide the JNDI name and Hibernate OGM will use this instance
instead of starting a new CacheManager
.
This will ignore any further configuration properties
as Infinispan is assumed being already configured.hibernate.ogm.infinispan.configuration_resourcename
JNDI
lookup is set.
Defaults to org/hibernate/ogm/datastore/infinispan/default-config.xml
.Hibernate OGM will not use a single Cache but three and is going to use them for different purposes; so that you can configure the Caches meant for each role separately.
Infinispan cache names and purpose
ENTITIES
ASSOCIATIONS
IDENTIFIER_STORE
We’ll explain in the following paragraphs how you can take advantage of this and which aspects of Infinispan you’re likely to want to reconfigure from their defaults. All attributes and elements from Infinispan which we don’t mention are safe to ignore. Refer to the Infinispan User Guide for the guru level performance tuning and customizations.
An Infinispan configuration file is an XML file complying with the Infinispan schema; the basic structure is shown in the following example:
Example 5.1. Simple structure of an infinispan xml configuration file
<?xml version="1.0" encoding="UTF-8"?>
<infinispan
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:infinispan:config:5.1 http://www.infinispan.org/schemas/infinispan-config-5.1.xsd"
xmlns="urn:infinispan:config:5.1">
<global>
</global>
<default>
</default>
<namedCache name="ENTITIES">
</namedCache>
<namedCache name="ASSOCIATIONS">
</namedCache>
<namedCache name="IDENTIFIERS">
</namedCache>
</infinispan>
The global
section contains elements which affect the whole instance;
mainly of interest for Hibernate OGM users is the transport
element
in which we’ll set JGroups configuration overrides.
In the namedCache
section (or in default
if we want to affect all named caches)
we’ll likely want to configure clustering modes, eviction policies and CacheStore
s.
In its default configuration Infinispan stores all data in the heap of the JVM; in this barebone mode it is conceptually not very different than using a HashMap: the size of the data should fit in the heap of your VM, and stopping/killing/crashing your application will get all data lost with no way to recover it.
To store data permanently (out of the JVM memory) a CacheStore
should be enabled.
The infinispan-core.jar
includes a simple implementation
able to store data in simple binary files, on any read/write mounted filesystem;
this is an easy starting point, but the real stuff is to be found
in the additional modules found in the Infinispan distribution.
Here you can find many more implementations to store your data in anything
from JDBC connected relational databases, other NoSQL engines,
to cloud storage services or other Infinispan clusters.
Finally, implementing a custom CacheStore
is a trivial programming exercise.
To limit the memory consumption of the precious heap space,
you can activate a passivation
or an eviction
policy;
again there are several strategies to play with,
for now let’s just consider you’ll likely need one to avoid running out of memory
when storing too many entries in the bounded JVM memory space;
of course you don’t need to choose one while experimenting with limited data sizes:
enabling such a strategy doesn’t have any other impact
in the functionality of your Hibernate OGM application
(other than performance: entries stored in the Infinispan in-memory space
is accessed much quicker than from any CacheStore).
A CacheStore
can be configured as write-through,
committing all changes to the CacheStore
before returning (and in the same transaction)
or as write-behind.
A write-behind configuration is normally not encouraged in storage engines,
as a failure of the node implies some data might be lost
without receiving any notification about it,
but this problem is mitigated in Infinispan because of its capability
to combine CacheStore write-behind
with a synchronous replication to other Infinispan nodes.
Example 5.2. Enabling a FileCacheStore and eviction
<namedCache name="ENTITIES">
<eviction strategy="LIRS" maxEntries="2000" />
<loaders
passivation="true" shared="false">
<loader
class="org.infinispan.loaders.file.FileCacheStore"
fetchPersistentState="false"
purgeOnStartup="false">
<properties>
<property name="location" value="/var/hibernate-ogm/myapp/entities-data" />
</properties>
</loader>
</loaders>
</namedCache>
In this example we enabled both eviction
and a CacheStore
(the loader
element).
LIRS
is one of the choices we have for eviction strategies.
Here it is configured to keep (approximately) 2000 entries in live memory
and evict the remaining as a memory usage control strategy.
The CacheStore
is enabling passivation
,
which means that the entries which are evicted are stored on the filesystem.
You could configure an eviction strategy while not configuring a passivating CacheStore! That is a valid configuration for Infinispan but will have the evictor permanently remove entries. Hibernate OGM will break in such a configuration.
Currently with Infinispan 5.1,
the FileCacheStore
is neither very fast nor very efficient:
we picked it for ease of setup.
For a production system it’s worth looking at the large collection
of high performance and cloud friendly cachestores
provided by the Infinispan distribution.
The best thing about Infinispan is that all nodes are treated equally and it requires almost no beforehand capacity planning: to add more nodes to the cluster you just have to start new JVMs, on the same or different physical server, having your same Infinispan configuration and your same application.
Infinispan supports several clustering cache modes; each mode provides the same API and functionality but with different performance, scalability and availability options:
Infinispan cache modes
To use the replication
or distribution
cache modes
Infinispan will use JGroups to discover and connect to the other nodes.
In the default configuration, JGroups will attempt to autodetect peer nodes using a multicast socket; this works out of the box in the most network environments but will require some extra configuration in cloud environments (which often block multicast packets) or in case of strict firewalls. See the JGroups reference documentation, specifically look for Discovery Protocols to customize the detection of peer nodes.
Nowadays, the JVM
defaults to use IPv6
network stack;
this will work fine with JGroups, but only if you configured IPv6
correctly.
It is often useful to force the JVM
to use IPv4
.
It is also useful to let JGroups know which networking interface you want to use; especially if you have multiple interfaces it might not guess correctly.
Example 5.3. JVM properties to set for clustering
#192.168.122.1 is an example IPv4 address -Djava.net.preferIPv4Stack=true -Djgroups.bind_addr=192.168.122.1
You don’t need to use IPv4
: JGroups is compatible with IPv6
provided you have routing properly configured and valid addresses assigned.
The jgroups.bind_addr
needs to match a placeholder name
in your JGroups configuration in case you don’t use the default one.
The default configuration uses distribution
as cache mode
and uses the jgroups-tcp.xml
configuration for JGroups,
which is contained in the Infinispan jar
as the default configuration for Infinispan users.
Let’s see how to reconfigure this:
Example 5.4. Reconfiguring cache mode and override JGroups configuration
<?xml version="1.0" encoding="UTF-8"?>
<infinispan
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:infinispan:config:5.1 http://www.infinispan.org/schemas/infinispan-config-5.1.xsd"
xmlns="urn:infinispan:config:5.1">
<global>
<transport
clusterName="HibernateOGM-Infinispan-cluster">
<properties>
<property name="configurationFile" value="my-jgroups-conf.xml" />
</properties>
</transport>
</global>
<default>
<clustering
mode="distribution" />
</default>
<!-- Cache to store the OGM entities -->
<namedCache
name="ENTITIES">
</namedCache>
<!-- Cache to store the relations across entities -->
<namedCache
name="ASSOCIATIONS">
</namedCache>
<!-- Cache to store identifiers -->
<namedCache
name="IDENTIFIERS">
<!-- Override the cache mode: -->
<clustering
mode="replication" />
</namedCache>
</infinispan>
In the example above we specify a custom JGroups configuration file
and set the cache mode for the default cache to distribution
;
this is going to be inherited by the ENTITIES
and the ASSOCIATIONS
caches.
But for IDENTIFIERS
we have chosen (for the sake of this example) to use replication
.
Now that you have clustering configured, start the service on multiple nodes. Each node will need the same configuration and jars.
We have just shown how to override the clustering mode and the networking stack for the sake of completeness, but you don’t have to!
Start with the default configuration and see if that fits you. You can fine tune these setting when you are closer to going in production.
Infinispan supports transactions and integrates with any standard JTA TransactionManager
;
this is a great advantage for JPA users as it allows to experience a similar behaviour
to the one we are used to when we work with RDBMS databases.
If you’re having Hibernate OGM start and manage Infinispan,
you can skip this as it will inject the same TransactionManager
instance
which you already have set up in the Hibernate / JPA configuration.
If you are providing an already started Infinispan CacheManager instance
by using the JNDI
lookup approach,
then you have to make sure the CacheManager is using the same TransactionManager
as Hibernate:
Example 5.5. Configuring a JBoss Standalone TransactionManager lookup
<default>
<transaction
transactionMode="TRANSACTIONAL"
transactionManagerLookupClass=
"org.infinispan.transaction.lookup.JBossStandaloneJTAManagerLookup" />
</default>
Infinispan supports different transaction modes like PESSIMISTIC
and OPTIMISTIC
,
supports XA
recovery and provides many more configuration options;
see the Infinispan User Guide
for more advanced configuration options.
Hibernate Search, which can be used for advanced query capabilities (see Chapter 7, Query your entities),
needs some place to store the indexes for its embedded Apache Lucene
engine.
A common place to store these indexes is the filesystem which is the default for Hibernate Search; however if your goal is to scale your NoSQL engine on multiple nodes you need to share this index. Network sharing filesystems are a possibility but we don’t recommended that. Often the best option is to store the index in whatever NoSQL database you are using (or a different dedicated one).
You might find this section useful even if you don’t intend to store your data in Infinispan.
The Infinispan project provides an adaptor to plug into Apache Lucene, so that it writes the indexes in Infinispan and searches data in it. Since Infinispan can be used as an application cache to other NoSQL storage engines by using a CacheStore (see Section 5.1.2, “Manage data size”) you can use this adaptor to store the Lucene indexes in any NoSQL store supported by Infinispan:
How to configure it? Here is a simple cheat sheet to get you started with this type of setup:
org.hibernate:hibernate-search-infinispan:4.4.0.Beta1
to your dependenciesset these configuration properties:
hibernate.search.default.directory_provider = infinispan
hibernate.search.default.exclusive_index_use = false
hibernate.search.infinispan.configuration_resourcename =
[infinispan configuration filename]The referenced Infinispan configuration should define a CacheStore
to load/store the index in the NoSQL engine of choice.
It should also define three cache names:
Table 5.1. Infinispan caches used to store indexes
Cache name | Description | Suggested cluster mode |
---|---|---|
LuceneIndexesLocking | Transfers locking information. Does not need a cache store. | replication |
LuceneIndexesData | Contains the bulk of Lucene data. Needs a cache store. | distribution + L1 |
LuceneIndexesMetadata | Stores metadata on the index segments. Needs a cache store. | replication |
This configuration is not going to scale well on write operations: to do that you should read about the master/slave and sharding options in Hibernate Search. The complete explanation and configuration options can be found in the Hibernate Search Reference Guide
Some NoSQL support storage of Lucene indexes directly,
in which case you might skip the Infinispan Lucene integration
by implementing a custom DirectoryProvider
for Hibernate Search.
You’re very welcome to share the code
and have it merged in Hibernate Search for others to use, inspect, improve and maintain.
When combined with Hibernate ORM, Ehcache is commonly used as a 2nd level cache, so caching data which is stored in a relational database. When used with Hibernate OGM it is not "just a cache" but is the main storage engine for your data.
This is not the reference manual for Ehcache itself: we’re going to list only how Hibernate OGM should be configured to use Ehcache; for all the tuning and advanced options please refer to the Ehcache Documentation.
Two steps:
And then choose one of:
To add the dependencies via some Maven-definitions-using tool, add the following module:
<dependency>
<groupId>org.hibernate.ogm</groupId>
<artifactId>hibernate-ogm-ehcache</artifactId>
<version>4.0.0.Beta4</version>
</dependency>
If you’re not using a dependency management tool, copy all the dependencies from the distribution in the directories:
/lib/required
/lib/ehcache
/lib/provided
Hibernate OGM expects you to define an Ehcache configuration in its own configuration resource; all what we need to set it the resource name.
To use the default configuration provided by Hibernate OGM - which is a good starting point for new users - you don’t have to set any property.
Ehcache datastore configuration properties
ehcache
./org/hibernate/ogm/datastore/ehcache/default-ehcache.xml
.While Ehcache technically supports transactions, Hibernate OGM is currently unable to use them. Careful!
If you need this feature, it should be easy to implement: contributions welcome! See JIRA OGM-243.
MongoDB is a document oriented datastore written in C++ with strong emphasis on ease of use.
This implementation is based upon the MongoDB Java driver. The currently supported version is 2.10.1.
The following properties are available to configure MongoDB support:
MongoDB datastore configuration properties
mongodb
127.0.0.1
.27017
5000
.GLOBAL_COLLECTION
stores the association information in a unique MongoDB collection for all associations.
COLLECTION
stores the association in a dedicated MongoDB collection per association.
IN_ENTITY
stores association information from within the entity.
IN_ENTITY
is the default.ERRORS_IGNORED
, ACKNOWLEDGED
, UNACKNOWLEDGED
,
FSYNCED
, JOURNALED
, NONE
, NORMAL
, SAFE
, MAJORITY
, FSYNC_SAFE
,
JOURNAL_SAFE
, REPLICAS_SAFE
.
For more information, please refer to the
official documentation.
This option is case insensitive and the default value is ACKNOWLEDGED
.Hibernate OGM tries to make the mapping to the underlying datastore as natural as possible so that third party applications not using Hibernate OGM can still read and update the same datastore. We worked particularly hard on the MongoDB model to offer various classic mappings between your object model and the MongoDB documents.
Entities are stored as MongoDB documents and not as BLOBs
which means each entity property will be translated into a document field.
You can use the name property of @Table
and @Column
annotation
to rename the collections and the document’s field if you need to.
Note that embedded objects are mapped as nested documents.
Example 5.6. Example of an entity with an embedded object
@Entity
public class News {
@Id
private String id;
private String title;
@Column(name="desc")
private String description;
@Embedded
private NewsPaper paper;
//getters, setters ...
}
@Embeddable
public class NewsPaper {
private String name;
private String owner;
//getters, setters ...
}
{ "_id" : "1234-5678-0123-4567", "title": "On the merits of NoSQL", "desc": "This paper discuss why NoSQL will save the world for good", "paper": { "name": "NoSQL journal of prophecies", "owner": "Delphy" } }
The _id
field of a MongoDB document is directly used
to store the identifier columns mapped in the entities.
That means you can use simple identifiers (no matter the Java type used)
as well as Embedded identifiers.
Embedded identifiers are stored as embedded document into the _id
field.
Hibernate OGM will convert the @Id
property into a _id
document field
so you can name the entity id like you want it will always be stored into _id
(the recommended approach in MongoDB).
That means in particular that MongoDB will automatically index your _id fields.
Let’s look at an example:
Example 5.7. Example of an entity using Embedded id
@Entity
public class News {
@EmbeddedId
private NewsID newsId;
//getters, setters ...
}
@Embeddable
public class NewsID implements Serializable {
private String title;
private String author;
//getters, setters ...
}
{ "_id" :{ "title": "How does Hibernate OGM MongoDB work?", "author": "Guillaume" } }
Hibernate OGM MongoDB proposes 3 strategies to store navigation information for associations.
To switch between each of these strategies,
use the hibernate.ogm.mongodb.associations.store
configuration property.
The three possible values are:
In this strategy, Hibernate OGM directly stores the id(s) of the other side of the association into a field or an embedded document depending if the mapping concerns a single object or a collection. The field that stores the relationship information is named like the entity property.
Example 5.8. Java entity
@Entity
public class AccountOwner {
@Id
private String id;
@ManyToMany
public Set<BankAccount> bankAccounts;
//getters, setters, ...
Example 5.9. JSON representation
{ "_id" : "owner0001", "bankAccounts" : [ { "bankAccounts_id" : "accountXYZ" } ] }
With this strategy, Hibernate OGM creates a single collection
in which it will store all navigation information for all associations.
Each document of this collection is structure in 2 parts.
The first is the _id
field which contains the identifier information
of the association owner and the name of the association table.
The second part is the rows
field which stores (into an embedded collection) all ids
that the current instance is related to.
Example 5.10. Unidirectional relationship
{ "_id": { "owners_id": "owner0001", "table": "AccountOwner_BankAccount" }, "rows": [ { "bankAccounts_id": "accountXYZ" } ] }
For a bidirectional relationship, another document is created where ids are reversed. Don’t worry, Hibernate OGM takes care of keeping them in sync:
Example 5.11. Bidirectional relationship
{ "_id": { "owners_id": "owner0001", "table": "AccountOwner_BankAccount" }, "rows": [{ "bankAccounts_id": "accountXYZ" }] } { "_id": { "bankAccounts_id": "accountXYZ", "table": "AccountOwner_BankAccount" }, "rows": [{ "owners_id": "owner0001" }] }
In this strategy, Hibernate OGM creates a MongoDB collection per association
in which it will store all navigation information for that particular association.
This is the strategy closest to the relational model.
If an entity A is related to B and C, 2 collections will be created.
The name of this collection is made of the association table concatenated with associations_
.
For example, if the BankAccount
and Owner
are related,
the collection used to store will be named associations_Owner_BankAccount
.
The prefix is useful to quickly identify the association collections from the entity collections.
Each document of an association collection has the following structure:
_id
contains the id of the owner of relationshiprows
contains all the id of the related entitiesExample 5.12. Unidirectional relationship
{ "_id" : { "owners_id" : "owner0001" }, "rows" : [ { "bankAccounts_id" : "accountXYZ" } ] }
Example 5.13. Bidirectional relationship
{ "_id" : { "owners_id" : "owner0001" }, "rows" : [ { "bankAccounts_id" : "accountXYZ" } ] } { "_id" : { "bankAccounts_id" : "accountXYZ" }, "rows" : [ { "owners_id" : "owner0001" } ] }
MongoDB does not support transaction. Only changes applied to the same document are done atomically. A change applied to more than one document will not be applied atomically. This problem is slightly mitigated by the fact that Hibernate OGM queues all changes before applying them during flush time. So the window of time used to write to MongoDB is smaller than what you would have done manually.
We recommend that you still use transaction demarcations with Hibernate OGM to trigger the flush operation transparently (on commit). But do not consider rollback as a possibility, this won’t work.
Hibernate OGM is a work in progress and we are actively working on JP-QL query support.
In the mean time, you have two strategies to query entities stored by Hibernate OGM:
Because Hibernate OGM stores data in MongoDB in a natural way, you can use the MongoDB driver and execute queries on the datastore directly without involving Hibernate OGM. The benefit of this approach is to use the query capabilities of MongoDB. The drawback is that raw MongoDB documents will be returned and not managed entities.
The alternative approach is to index your entities with Hibernate Search. That way, a set of secondary indexes independent of MongoDB is maintained by Hibernate Search and you can write queries on top of them. The benefit of this approach is an nice integration at the JPA / Hibernate API level (managed entities are returned by the queries). The drawback is that you need to store the Lucene indexes somewhere (file system, infinispan grid etc). Have a look at the Infinispan section for more info on how to use Hibernate Search.
Neo4j is a robust (fully ACID) transactional property graph database. This kind of databases are suited for those type of problems that can be represented with a graph like social relationships or road maps for example.
At the moment only the support for the embeedded Neo4j is included in OGM.
This is our first version and a bit experimental. In particular we plan on using node navigation much more than index lookup in a future version.
If your project uses Maven you can add this to the pom.xml:
<dependency>
<groupId>org.hibernate.ogm</groupId>
<artifactId>hibernate-ogm-neo4j</artifactId>
<version>4.0.0.Beta4</version>
</dependency>
Alternatively you can find the required libraries in the distribution package on SourceForge
hibernate.ogm.datastore.provider = neo4j_embedded hibernate.ogm.neo4j.database.path = C:\example\mydb
The following properties are available to configure Neo4j support:
Neo4j datastore configuration properties
C:\neo4jdb\mydb
_nodes_ogm_index
_relationships_ogm_index
_sequences_ogm_index
Entities are stored as Neo4j nodes, which means each entity property will be translated into a property of the node. An additional property is added to the node and it contains the name of the table representing the entity.
Example 5.14. Example of entities and the list of properties contained in the corresponding node
@Entity
class Account {
@Id
String login;
String password;
Address homeAddress;
//...
}
@Embeddable
class Address {
String city;
String zipCode;
//...
}
Node properties: _table id login password homeAddress_city homeAddress_zipCode
The _table
property has been added by OGM and it contains the name of the table representing the entity (Account
in this simple case).
Associations are mapped using Neo4j relationships. A unidirectional association is mapped with a relationship between two nodes that start from the node representing the owner of the association. The name of the association is saved as type of the relationship. A bidirectional association is represented by two relationships, one per direction, between the two nodes.
Neo4j operations can be executed only inside a transaction.
Unless a different org.hibernate.engine.transaction.jta.platform.spi.JtaPlatform
is specified, OGM will integrate with the Neo4j transaction mechanism,
this means that you should start and commit transaction using the hibernate session.
Example 5.15. Example of starting and committing transactions
Session session = factory.openSession();
Transaction tx = session.beginTransaction();
Account account = new Account();
account.setLogin( "myAccount" );
session.persist( account );
tx.commit();
...
tx = session.beginTransaction();
Account savedAccount = (Account) session.get( Account.class, account.getId() );
tx.commit();