Storing files and folders

One really nice feature of JCR repositories is that you can use them to store files and folders. And because it is such a common pattern, the JCR specification defines several built-in node types that can be used to do just this:

nt:folder - The node type used to represent folder-type nodes
nt:file - The node type used to represent files.
nt:hierarchyNode - An abstract node type that serves as the base type of nt:file and nt:folder
nt:resource - The node that used to represent the content of the file
nt:linkedFile - Similar to nt:file, except that the content is not stored under the file node but instead is a reference to the content stored elsewhere.

Learning to use these node types can take a little work, because they're not quite as straightforward as you might think. Here's a UML diagram showing the node types and the inheritance hierarchy:

images/author/download/attachments/103547120/NodeTypeInheritance.png

Using the built-in node types

Consider a "MyDocuments" folder that contains a "Personal" folder and a "Status Report.pdf" file. Here’s what those nodes might look like:

images/author/download/attachments/103547120/FolderAndFileNodes.png

The folders look like what you might expect: they have a name, a primary type of nt:folder, and the jcr:createdBy and jcr:created properties defined by the nt:folder node type. (These properties are defined as autocreated, meaning the repository should set these automatically.)

The file representation, on the other hand, is different. The "Status Report.pdf" node has a primary type of nt:file and the jcr:createdBy and jcr:created properties defined by the nt:file node type, but everything about the content (including the binary file content in the jcr:data property) is actually stored in the child node named "jcr:content". This may seem odd at first, but actually this design very nicely separates the file-related information from the content-related information.

Think about how an application might navigate the files and folders in a repository. Using the JCR API, the application asks for the "MyDocuments" node, so the repository materializes it (and probably its list of children) from storage. The application then asks for the children, so the repository loads the "Personal" folder node and the "Status Report.pdf" node, and there’s enough information on those nodes for the application to display relevant information. Note that the "Status Report.pdf" file’s content has not yet been materialized. Only when the application asks for the content of the file (that is, it asks for the "jcr:content" node) will the content-related information be materialized by the repository. (And, some repository implementations might delay loading the jcr:data binary property until the application asks for it.) Nice, huh?

Creating folders using the JCR API

Creating folders is pretty straightforward:

// Find the parent node ...
Node myDocuments = session.getNode(pathToMyDocuments);

// Create a new folder node ...
Node personal = myDocuments.addNode("Personal","nt:folder");

// The auto-created properties are added when the session is saved ...
session.save();

// Get the property values that were auto-created ...
String createdBy = personal.getProperty("jcr:createdBy").getString();
Calendar createdAt = personal.getProperty("jcr:created").getDate();

Note how we used the second parameter of the addNode method to specify which node type should be used as the primary type for the new node. Also, the "jcr:created" and "jcr:createdBy" auto-created properties defined on the nt:folder node type (actually, inherited from the nt:resource supertype, which inherits it from the mix:created mixin type).

You can also use the org.modeshape.jcr.api.JcrTools class which has some nice methods for creating file and folder hierarchies

Uploading files using the JCR API

The only tricky part of adding files to the repository is just properly creating the two node pattern for nt:file nodes. Here's the code that uploads a file represented by a java.io.File object:

// Find the parent node ...
Node folder = ...

// Assume that we have a file that exists and can be read ...
File file = ...

// Determine the last-modified by value of the file (if important) ...
Calendar lastModified = Calendar.getInstance();
lastModified.setTimeInMillis(file.lastModified());

// Create a buffered input stream for the file's contents ...
InputStream stream = new BufferedInputStream(new FileInputStream(file));

// Create an 'nt:file' node at the supplied path ...
Node fileNode = folder.addNode(file.getName(),"nt:file");

// Upload the file to that node ...
Node contentNode = fileNode.addNode("jcr:content", "nt:resource");
Binary binary = session.getValueFactory().createBinary(stream);
contentNode.setProperty("jcr:data", binary);
contentNode.setProperty("jcr:lastModified",lastModified);

// Save the session (and auto-created the properties) ...
session.save();

Again, this is not very complicated if you understand the pattern. We first create the nt:file node that represents the file and its metadata, and under that we create a child node named "jcr:content" (of type nt:resource) that represents the content of the file. We also explicitly set the "jcr:lastModified" timestamp to mirror the time the file was last modified (if that's important).

We could try to set the "jcr:createdBy" or "jcr:created" properties till we're blue in the face, but JCR always sets the time of these auto-created properties when newly-created nodes are saved, and JCR will always overwrite the values we set.

Adding other properties

Another interesting aspect of the nt:file and nt:folder node types (and even the nt:resource node type) is that they don’t allow adding just any property on the node. The beauty is that they don’t have to, because you can still add extra properties to these nodes using mixins!

Let’s imagine that we want to add tags to our file and folder nodes, and that we want to start capturing the SHA-1 checksum (as a hexadecimal string) of our files. To start, we need to create two mixins, which we'll define using the standard CND format:

<acme = "http://www.acme.com/nodetypes/1.0">
[acme:taggable] mixin
- acme:tags (STRING) multiple

[acme:checksum] mixin
- acme:sha1 (STRING) mandatory

We could have defined a mixin that allows any single or multi-valued property, similar to how the standard nt:unstructured node type is defined. Then, we can add any properties we want. However, it's sometimes better to use more targeted mixins like these, if for no other reason than it makes it very easy to use JCR-SQL2 to query the nodes that use these mixins.

We then need to register these node types in our repository (perhaps by loading the CND file or programmatically using the NodeTypeManager). Then, we can add the acme:taggable mixin to whatever file and folder nodes we want. This is as simple as:

// Find the node ...
Node myDocuments = session.getNode(pathToMyDocuments);
Node personalFolder = myDocuments.getNode("Personal");

// Add the mixin and set the "acme:tags" property on the "Personal" folder ...
personalFolder.addMixin("acme:taggable");
String[] tags = {"non-work"};
personalFolder.setProperty("acme:tags",tags);

// Add the tags to the "Status Report.pdf" node ...
Node statusReport = myDocuments.getNode("Status Report.pdf");
statusReport.addMixin("acme:taggable");
statusReport.setProperty("acme:tags",{"status", "projectX"});

// Add add the SHA-1 hash to the "jcr:content" node ...
Node content = statusReport.getNode("jcr:content");
content.addMixin("acme:checksum");
content.setProperty("acme:sha1","e676b12c3ebfb1"});

// Save the changes ...
session.save();

The result is something like this, where the new properties are shown in a boldface font:

images/author/download/attachments/103547120/FileAndFolderNodeTypes-with-mixins.png

Reading the content

Reading the file and folder information is simpler than writing it. The key is to remember that the "file" information is broken into separate nodes.

// Find the node ...
Node myDocuments = session.getNode(pathToMyDocuments);
Node statusReport = myDocuments.getNode("Personal/Status Report.pdf");

// Get the created information (we'll assume it's there) ...
Calendar created = statusReport.getProperty("jcr:created").getDate();
String creator = statusReport.getProperty("jcr:createdBy").getString();

// Now get the content of the file ...
Node statusReportContent = statusReport.getNode("jcr:content);

// Get the MIME type if it's there ...
String mimeType = null;
if ( statusReportContent.hasProperty("jcr:mimeType") ) {
    mimeType = statusReportContent.getProperty("jcr:mimeType").getString();
}
Binary content = statusReportContent.getProperty("jcr:data").getBinary();
InputStream stream = content.getInputStream();
try {
    // do something with the stream
} finally {
    stream.close();
}

If you added mixins and other properties to these nodes, they're accessible just like any other property.

Extending the built-in node types

We showed above how to use mixins to store extra properties on nodes that use primary types that don't allow just any property. While that may be a preferred way to do it, it's not the only way. Another approach is to create custom node types that we'd use in place of nt:folder, nt:file and nt:resource. Here's a CND fragment showing the new types:

<acme = "http://www.acme.com/nodetypes/1.0">
[acme:file] > nt:file
- acme:tags (STRING) multiple

[acme:folder] > nt:folder
- acme:tags (STRING) multiple

[acme:resource] > nt:resource
- acme:sha1 (STRING) mandatory

Then in our code we could simply use "acme:file" in place of "nt:file", and "acme:folder" in place of "nt:folder", and "acme:resource" in place of "nt:resource".

But there are several pretty big disadvantages to this approach:

Any applications that are expecting nt:file and nt:folder nodes might break. Granted, such apps would have been hard coded to expect a particular set of content. But someone may have taken a shortcut.
Notice that we've had to define three node types, even though both the acme:file and acme:folder types have exactly the same property. When we used mixins, we only created two mixins, and we could use them anywhere.
The new node types don't really mirror a characteristic or facet (e.g., something is "taggable" or "has a SHA-1 hash"), whereas the mixins did exactly that and we could use them anywhere it made sense.
Every acme:file has an optional "acme:tags" property, whether or not that node needs it. And so we have to make the decision up front whether a file should be represented by an acme:file node or a standard nt:file node. With mixins, we can add the acme:tags mixin only when we need to add tags to a node.

There is one benefit to defining node types in this way: when using JCR-SQL2 queries, we can use a simple query to find all the properties of acme:file nodes:

SELECT file.[jcr:createdBy], file.[acme:tags], file.[jcr:name]
FROM [acme:file] AS file

If we had used mixins with the standard nt:file types, we can still get the same information but our query has to use a join:

SELECT file.[jcr:createdBy], taggable.[acme:tags], file.[jcr:name]
FROM [nt:file] AS file JOIN [acme:tags] as taggable
ON ISSAMENODE(file,taggable)

That's certainly not much more complicated, and in general the benefits of using mixins far outweigh the slightly-increased complexity of JCR-SQL2 queries.

JBoss Community Archive (Read Only)

ModeShape 5

Storing files and folders

Using the built-in node types

Creating folders using the JCR API

Uploading files using the JCR API

Adding other properties

Reading the content

Extending the built-in node types