One really nice feature of JCR repositories is that you can use them to store files and folders. And because it is such a common pattern, the JCR specification defines several built-in node types that can be used to do just this:
- nt:folder - The node type used to represent folder-type nodes
- nt:file - The node type used to represent files.
- nt:hierarchyNode - An abstract node type that serves as the base type of nt:file and nt:folder
- nt:resource - The node that used to represent the content of the file
- nt:linkedFile - Similar to nt:file, except that the content is not stored under the file node but instead is a reference to the content stored elsewhere.
Learning to use these node types can take a little work, because they're not quite as straightforward as you might think. Here's a UML diagram showing the node types and the inheritance hierarchy:
Consider a "MyDocuments" folder that contains a "Personal" folder and a "Status Report.pdf" file. Here’s what those nodes might look like:
The folders look like what you might expect: they have a name, a primary type of nt:folder, and the jcr:createdBy and jcr:created properties defined by the nt:folder node type. (These properties are defined as autocreated, meaning the repository should set these automatically.)
The file representation, on the other hand, is different. The "Status Report.pdf" node has a primary type of nt:file and the jcr:createdBy and jcr:created properties defined by the nt:file node type, but everything about the content (including the binary file content in the jcr:data property) is actually stored in the child node named "jcr:content". This may seem odd at first, but actually this design very nicely separates the file-related information from the content-related information.
Think about how an application might navigate the files and folders in a repository. Using the JCR API, the application asks for the "MyDocuments" node, so the repository materializes it (and probably its list of children) from storage. The application then asks for the children, so the repository loads the "Personal" folder node and the "Status Report.pdf" node, and there’s enough information on those nodes for the application to display relevant information. Note that the "Status Report.pdf" file’s content has not yet been materialized. Only when the application asks for the content of the file (that is, it asks for the "jcr:content" node) will the content-related information be materialized by the repository. (And, some repository implementations might delay loading the jcr:data binary property until the application asks for it.) Nice, huh?
Creating folders is pretty straightforward:
Note how we used the second parameter of the addNode method to specify which node type should be used as the primary type for the new node. Also, the "jcr:created" and "jcr:createdBy" auto-created properties defined on the nt:folder node type (actually, inherited from the nt:resource supertype, which inherits it from the mix:created mixin type).
|You can also use the org.modeshape.jcr.api.JcrTools class which has some nice methods for creating file and folder hierarchies|
The only tricky part of adding files to the repository is just properly creating the two node pattern for nt:file nodes. Here's the code that uploads a file represented by a java.io.File object:
Again, this is not very complicated if you understand the pattern. We first create the nt:file node that represents the file and its metadata, and under that we create a child node named "jcr:content" (of type nt:resource) that represents the content of the file. We also explicitly set the "jcr:lastModified" timestamp to mirror the time the file was last modified (if that's important).
|We could try to set the "jcr:createdBy" or "jcr:created" properties till we're blue in the face, but JCR always sets the time of these auto-created properties when newly-created nodes are saved, and JCR will always overwrite the values we set.|
Another interesting aspect of the nt:file and nt:folder node types (and even the nt:resource node type) is that they don’t allow adding just any property on the node. The beauty is that they don’t have to, because you can still add extra properties to these nodes using mixins!
Let’s imagine that we want to add tags to our file and folder nodes, and that we want to start capturing the SHA-1 checksum (as a hexadecimal string) of our files. To start, we need to create two mixins, which we'll define using the standard CND format:
|We could have defined a mixin that allows any single or multi-valued property, similar to how the standard nt:unstructured node type is defined. Then, we can add any properties we want. However, it's sometimes better to use more targeted mixins like these, if for no other reason than it makes it very easy to use JCR-SQL2 to query the nodes that use these mixins.|
We then need to register these node types in our repository (perhaps by loading the CND file or programmatically using the NodeTypeManager). Then, we can add the acme:taggable mixin to whatever file and folder nodes we want. This is as simple as:
The result is something like this, where the new properties are shown in a boldface font:
Reading the file and folder information is simpler than writing it. The key is to remember that the "file" information is broken into separate nodes.
If you added mixins and other properties to these nodes, they're accessible just like any other property.
We showed above how to use mixins to store extra properties on nodes that use primary types that don't allow just any property. While that may be a preferred way to do it, it's not the only way. Another approach is to create custom node types that we'd use in place of nt:folder, nt:file and nt:resource. Here's a CND fragment showing the new types:
Then in our code we could simply use "acme:file" in place of "nt:file", and "acme:folder" in place of "nt:folder", and "acme:resource" in place of "nt:resource".
But there are several pretty big disadvantages to this approach:
- Any applications that are expecting nt:file and nt:folder nodes might break. Granted, such apps would have been hard coded to expect a particular set of content. But someone may have taken a shortcut.
- Notice that we've had to define three node types, even though both the acme:file and acme:folder types have exactly the same property. When we used mixins, we only created two mixins, and we could use them anywhere.
- The new node types don't really mirror a characteristic or facet (e.g., something is "taggable" or "has a SHA-1 hash"), whereas the mixins did exactly that and we could use them anywhere it made sense.
- Every acme:file has an optional "acme:tags" property, whether or not that node needs it. And so we have to make the decision up front whether a file should be represented by an acme:file node or a standard nt:file node. With mixins, we can add the acme:tags mixin only when we need to add tags to a node.
There is one benefit to defining node types in this way: when using JCR-SQL2 queries, we can use a simple query to find all the properties of acme:file nodes:
If we had used mixins with the standard nt:file types, we can still get the same information but our query has to use a join:
That's certainly not much more complicated, and in general the benefits of using mixins far outweigh the slightly-increased complexity of JCR-SQL2 queries.