The Connector framework
A connector is actually just a plain old Java object (POJO), so creating a connector is pretty straightforward: create a Java class that extends one of the following abstract classes:
-
ReadOnlyConnector - extend this class when ModeShape clients will never be able to manipulate, create or remove any content exposed by the connector.
-
WritableConnector - extend this class when ModeShape clients may be able to manipulate, create and/or remove content exposed by the connector. Note that each time this connector is configured, it can still be made to be read-only.
A connector operates by accessing an external system and dynamically creating nodes that represent information in that external system. The nodes must form a single tree, although how that tree is structured and what the nodes actually look like is completely up to the connector implementation.
Documents
While a connector conceptually exposes nodes, technically it exchanges representations of nodes (and other information, like sublists of children). These representations take the form of Java Document objects that are semantically like JSON and BSON documents. The connector SPI does this for a number of reasons. Firstly, ModeShape actually stores its own internal (non-federated) nodes as Documents, so connectors are actually working with the same kind of internal Document instances that ModeShape uses. Secondly, a Document is easily converted to and from JSON (and BSON), making it potentially very easy to write a connector that accesses a remote system. Thirdly, constructs other than nodes can be represented as documents; for example, a connector can be pageable, meaning it breaks the list of child node references into multiple pages that are read with separate requests, allowing the connector to efficiently expose large numbers of children under a single node. Finally, the node's identifier, properties, child node references, and other ModeShape-specific information are stored in specific fields within a Document, but additional fields can be used by the connector and hidden to ModeShape clients. Though this makes little sense for a read-only connector, a writable connector might include such hidden fields when reading nodes so that when the document comes back to the connector those hidden fields are still available.
We'll see what these {{Document}}s look like in a little bit, but first let's look at the methods that your connector implementation will need to implement.
Read only connector
The following code fragment shows the methods that a ReadOnlyConnector subclass must/should implement.
package org.modeshape.jcr.federation.spi;
import java.io.IOException;
import java.util.Collection;
import javax.jcr.NamespaceRegistry;
import javax.jcr.RepositoryException;
import org.infinispan.schematic.document.Document;
import org.modeshape.jcr.api.nodetype.NodeTypeManager;
public abstract class ReadOnlyConnector extends Connector {
...
/**
* Initialize the connector. This is called automatically by ModeShape once for each Connector instance,
* and should not be called by the connector. By the time this method is called, ModeShape will have
* already set the {{ExecutionContext}}, {{Logger}}, connector name, repository name {@link #context},
* and any fields that match configuration properties for the connector.
*
* By default this method does nothing, so it should be overridden by implementations to do a one-time
* initialization of any internal components. For example, connectors can use the supplied {{registry}}
* and {{nodeTypeManager}} parameters to register custom namespaces and node types used by the exposed nodes.
*
* This is also an excellent place for connector to validate the connector-specific fields set by ModeShape
* via reflection during instantiation.
*
* @param registry the namespace registry that can be used to register custom namespaces; never null
* @param nodeTypeManager the node type manager that can be used to register custom node types; never null
* @throws RepositoryException if operations on the {@link NamespaceRegistry} or {@link NodeTypeManager} fail
* @throws IOException if any stream based operations fail (like importing cnd files)
*/
public void initialize( NamespaceRegistry registry,
NodeTypeManager nodeTypeManager ) throws RepositoryException, IOException {
}
/**
* Returns the id of an external node located at the given external path within the connector's
* exposed tree of content.
*
* @param externalPath a non-null string representing an external path, or "/" for the top-level
* node exposed by the connector
* @return either the id of the document or null
*/
public abstract String getDocumentId( String externalPath );
/**
* Returns a Document instance representing the document with a given id. The document should have
* a "proper" structure for it to be usable by ModeShape.
*
* @param id a {@code non-null} string
* @return either an {@link Document} instance or {@code null}
*/
public abstract Document getDocumentById( String id );
/**
* Return the path(s) of the external node with the given identifier. The resulting paths are from the
* point of view of the connector. For example, the "root" node exposed by the connector wil have a
* path of "/".
*
* @param id a null-null string
* @return the connector-specific path(s) of the node, or an empty document if there is no such
* document; never null
*/
public abstract Collection<String> getDocumentPathsById( String id );
/**
* Checks if a document with the given id exists in the end-source.
*
* @param id a non-null string.
* @return {{true}} if such a document exists, {{false}} otherwise.
*/
public abstract boolean hasDocument( String id );
...
}
Not shown are fields, getters, and other implemented methods that your methods will almost certainly use. For example, a Document is a read-only representation of a JSON document, and they can be created by calling the newDocument(id) method with the document's identifier, using the resulting DocumentWriter to set/remove/add fields (and nested documents), and calling the writer's document() method to obtain the read-only Document instance.
The DocumentWriter interface provides dozens of methods for getting and setting node properties and child node references. Here's some code that uses a document writer to construct a node representation with a few properties:
String id = ...
DocumentWriter writer = newDocument(id);
writer.setPrimaryType("lib:book");
writer.addMixinType("lib:tagged");
writer.addProperty("lib:isbn, "0486280616");
writer.addProperty("lib:format, "paperback");
writer.addProperty("lib:author", "Mark Twain");
writer.addProperty("lib:title", "The Adventures of Huckleberry Finn");
writer.addProperty("lib:tags", "fiction", "classic", "americana");
// Add a single child named 'tableOfContents' with its own identifier
writer.addChild(id + "/toc","tableOfContents");
Document doc = writer.document();
As you can see, creating documents is pretty straightforward.
Identifiers of documents are simple strings that are expected to uniquely and durably identify a document. However, the content of that string is entirely up to the connector implementations. If the external system already has the notion of unique identifiers, it might be easiest to simply reuse a string representation of those identifiers. For example, a database might have a unique key within a given table, whereas a Git repository uses SHA-1 hashes for identifiers of commits, branches, tags, etc. Some external systems (like file systems) don't have a concept of unique identifiers, and in such cases the connector should devise its own identifier mechanism is durable and reliable.
Properties, Paths, Names, and values
Most of the time, you can use string property names and property values that are String, Calendar, URL, or Numeric instances, and ModeShape will convert to an internal object representation. However, ModeShape provides object definitions of JCR names, paths, values, and properties. These classes are often much easier to work with than the String names and paths, and they're easy to create using ModeShape's namespace-aware factories. The "ValueFactories" interface is a container for type-specific factories accessible with various getter methods. Here's an example of creating a Path value from a string and then using the Path methods to get at the already-parsed segments of the path:
String str = "/a/b/c/cust:d";
PathFactory pathFactory = factories().getPathFactory();
Path path = pathFactory.create(str);
for ( Segment segment : path ) {
Name name = segment.getName();
String localName = name.getLocalName();
String namespaceUri = name.getNamespaceUri();
if ( segment.hasIndex() ) {
String snsIndex = segment.getIndex();
}
}
Path parentPath = path.getParent();
...
The process of using a factory to create Name, Binary, DateTime, and all other JCR-compliant values is similar.
Properties are slightly different, since they are a bit more structured. ModeShape provides a PropertyFactory that can create single- or multi-valued Property instances given a name and one or more values. Here's some simple code that shows how to create a single-valued property:
PropertyFactory propFactory = propertyFactory();
Name propName = nameFactory().create("lib:title");
String propValue = factories().stringFactory("The Adventures of Huckleberry Finn");
Property prop = propFactory.create(propName,propValue);
All Property, Name, Path, DateTime, and Binary instances are immutable, meaning you can pass them around without worrying about whether the receiver might modify them. Also, the factories will often pick implementation classes that are tailored for the specific value. For example, there are separate implementations for the root path, single-segment paths, paths created from a parent path, single-valued properties, empty properties, and multi-valued properties.
Standard connector properties
Writable connector
The following code fragment shows the methods that a WritableConnector subclass must/should implement.
package org.modeshape.jcr.federation.spi;
import java.io.IOException;
import java.util.Collection;
import javax.jcr.NamespaceRegistry;
import javax.jcr.RepositoryException;
import org.infinispan.schematic.document.Document;
import org.modeshape.jcr.api.nodetype.NodeTypeManager;
public abstract class WritableConnector extends Connector {
...
/**
* Initialize the connector. This is called automatically by ModeShape once for each Connector instance,
* and should not be called by the connector. By the time this method is called, ModeShape will have
* already set the {{ExecutionContext}}, {{Logger}}, connector name, repository name {@link #context},
* and any fields that match configuration properties for the connector.
*
* By default this method does nothing, so it should be overridden by implementations to do a one-time
* initialization of any internal components. For example, connectors can use the supplied {{registry}}
* and {{nodeTypeManager}} parameters to register custom namespaces and node types used by the exposed nodes.
*
* This is also an excellent place for connector to validate the connector-specific fields set by ModeShape
* via reflection during instantiation.
*
* @param registry the namespace registry that can be used to register custom namespaces; never null
* @param nodeTypeManager the node type manager that can be used to register custom node types; never null
* @throws RepositoryException if operations on the {@link NamespaceRegistry} or {@link NodeTypeManager} fail
* @throws IOException if any stream based operations fail (like importing cnd files)
*/
public void initialize( NamespaceRegistry registry,
NodeTypeManager nodeTypeManager ) throws RepositoryException, IOException {
}
/**
* Returns the id of an external node located at the given external path within the connector's
* exposed tree of content.
*
* @param externalPath a non-null string representing an external path, or "/" for the top-level
* node exposed by the connector
* @return either the id of the document or null
*/
public abstract String getDocumentId( String externalPath );
/**
* Returns a Document instance representing the document with a given id. The document should have
* a "proper" structure for it to be usable by ModeShape.
*
* @param id a {@code non-null} string
* @return either an {@link Document} instance or {@code null}
*/
public abstract Document getDocumentById( String id );
/**
* Return the path(s) of the external node with the given identifier. The resulting paths are
* from the point of view of the connector. For example, the "root" node exposed by the connector
* will have a path of "/".
*
* @param id a null-null string
* @return the connector-specific path(s) of the node, or an empty document if there is no such
* document; never null
*/
public abstract Collection<String> getDocumentPathsById( String id );
/**
* Checks if a document with the given id exists in the end-source.
*
* @param id a non-null string.
* @return {{true}} if such a document exists, {{false}} otherwise.
*/
public abstract boolean hasDocument( String id );
/**
* Removes the document with the given id.
*
* @param id a non-null string.
* @return {{true}} if the document was removed, or {{false}} if there was no document with the
* given id
*/
public abstract boolean removeDocument( String id );
/**
* Stores the given document.
*
* @param document a non-null Document instance.
* @throws DocumentAlreadyExistsException if there is already a new document with the same identifier
* @throws DocumentNotFoundException if one of the modified documents was removed by another session
*/
public abstract void storeDocument( Document document );
/**
* Updates a document using the provided changes.
*
* @param documentChanges a non-null DocumentChanges object which contains
* granular information about all the changes.
*/
public abstract void updateDocument( DocumentChanges documentChanges );
/**
* Generates an identifier which will be assigned when a new document (aka. child) is created under an
* existing document (aka.parent). This method should be implemented only by connectors which support
* writing.
*
* @param parentId a non-null string which represents the identifier of the parent under which the new
* document will be created.
* @param newDocumentName a non-null Name which represents the name that will be given
* to the child document
* @param newDocumentPrimaryType a non-null Name which represents the child document's
* primary type.
* @return either a non-null string which will be assigned as the new identifier, or null which means
* that no "special" id format is required. In this last case, the repository will
* auto-generate a random id.
* @throws org.modeshape.jcr.cache.DocumentStoreException if the connector is readonly.
*/
public abstract String newDocumentId( String parentId,
Name newDocumentName,
Name newDocumentPrimaryType );
...
}
A WritableConnector has to implement all of the read-related methods that a ReadOnlyConnector must implement and a handful of write-related methods for removing, updating, and storing new documents (nodes).
Just as ModeShape provides a DocumentWriter, there is also a DocumentReader that has methods to easily read properties, primary type, mixin types, and child references. Using it is just as simple as using the writer:
Document doc = ...
DocumentReader reader = readDocument(doc);
String id = reader.getDocumentId();
String primaryType = reader.getPrimaryTypeName();
Map<Name, Property> properties = reader.getProperties();
// Get the ordered list of child references ...
LinkedHashMap<String,Name> childReferences = reader.getChildrenMap();
for ( Map<String,Name>.Entry childRef : childReferences.entrySet() ) {
String key = childRef.getKey();
String name = childRef.getValue();
}
Pageable connectors
A Document that represents a node will contain references to all the children of that node. These references are relatively small (just the ID and name of the child), and for many connectors this is sufficient and fast enough. However, when the number of children under a node starts to increase, building the list of child references for a parent node can become noticeable and even burdensome, especially when few (if any) of the child references may ultimately be resolved into their node representations.
A pageable connector is one that want to expose the children of nodes in a "page by page" fashion, where the parent node only contains the first page of child references and subsequent pages are loaded only if needed. This turns out to be quite effective, since when clients navigate a specific path (or ask for a specific child of a parent by its name) ModeShape doesn't need to use the child references in a node's document and can instead simply have the connector resolve such (relative or absolute external) paths into an identifier and then ask for the document with that ID.
Therefore, the only time the child references are needed are when clients iterate over the children of a node. A pageable connector will only be asked for as many pages as needed to handle the client's iteration, making it very efficient for exposing a node structure that can contain nodes with numerous children.
To make your ReadOnlyConnector or WritableConnector support paging, simply implement the Pageable interface:
package org.modeshape.jcr.federation.spi;
public interface Pageable {
/**
* Return a document which represents a page of children. The document for the parent node
* should include as many children as desired, and then include a reference to the next
* page of children with the {{PageWriter#addPage(String, String, long, long)}} method.
* Each page returned by this method should also include a reference to the next page.
*
* @param pageKey a non-null {@link PageKey} instance, which offers information about the
* page that should be retrieved.
* @return either a non-null page document or {@code null} indicating that such a page
* doesn't exist
*/
Document getChildren( PageKey pageKey );
}
ModeShape then knows that the document for the parent will contain only some of the children and how to access each page of children as needed.
For example, here's an example of code that might be used in a connector's "getDocumentById(...)" method to include some of the children in the parent node's document and to include a reference to a second page of children. This uses an imaginary "Book" class that is presumed to represent information about a book in a library:
String id = "category/Americana";
DocumentWriter writer = newDocument(id);
writer.setPrimaryType("lib:category");
writer.addProperty("lib:description", "Classic American literature");
// Get the books in this category ...
Collection<Book> books = getBooksInCategory("Americana");
// Put just 20 in this document ...
count = 0;
for ( Book book : books ) {
writer.addChild(book.getId(),book.getTitle());
if ( ++count == 20 ) break;
}
if ( count == 20 ) {
// There were more than 20 books, so add a reference to the next page
// that starts with the 20th book ...
writer.addPage(id, 20, 20, books.size());
}
Document doc = writer.document();
Then, the connector's "getPage(...)" method would implement getting the child references for a particular page:
public Document getPage( PageKey pageKey ) {
String parentId = pageKey.getParentId();
int offset = pageKey.getOffsetInt();
String category = parentId.substring(9); // we assume this is "category/{categoryName}"
DocumentWriter writer = newDocument(parentId);
// Get the next 20 books in this category plus one so we know there are more ...
List<Book> books = getBooksInCategory("Americana").sublist(offset,offset+20+1); // no error checking here!
for ( Book book : books ) {
writer.addChild(book.getId(),book.getTitle());
if ( ++count == 20 ) break;
}
if ( count == 20 ) {
// There were more than 20 books, so add a reference to the next page
// that starts with the 20th book ...
writer.addPage(id, 20, 20, books.size());
}
Document doc = writer.document();
}
As you can see, the logic of getPage(...) is actually very similar to the logic that adds children in the getDocumentById(...) method, and your connector might find it useful to abstract this into a single helper method.