{ ... "externalSources" : { "local-git-repo" : { "classname" : "org.modeshape.connector.filesystem.FileSystemConnector", "directoryPath" : "/a/b/c/", "projections" : \[ "/files" \] } } ... }
This connector exposes files and folders on the file system as nt:file and nt:folder nodes in the repository. To use, configure an external source for a given file system (or area of the repository); each external source can be set up as read-only (to only expose the file system's existing files and folders) or as writable (to allow JCR clients to create/update/delete files and folders on the file system).
The File System Connector maps nt:file and nt:folder properties directly to the attributes on the file system's files and folders. By default, ModeShape will store these extra properties in the same Infinispan cache where the normal content is stored, though such content will be lost if files and folders are moved or renamed outside of ModeShape. Several other options are possible, including storing these extra properties on the file system using "sidecar" files that are named similarly to and stored adjacent to the target file or folder. See the extraPropertiesStorage attribute description below for more detail.
The connector does not currently monitor the file system for newly created files or folders, and therefore no events are created. However, navigation will always expose the current files/folder nodes within a folder. ModeShape can index the content so that the projected nt:file, nt:folder, and nt:resource nodes can be queried, but this must be done manually via the Workspace API's "reindex" methods.
The file system connector is pageable, which means it can efficiently expose folders that contain large numbers of items. Paging is a tradeoff between loading the parent node faster (by having smaller numbers of child references) and having to go back to the connector more frequently. By default, the connector includes only 20 items per page, so the page size can be adjusted to best suit your application's needs.
The connector classname is "org.modeshape.connector.filesystem.FileSystemConnector", and there are several attributes that should be configured on each external source:
Attribute Name |
Description |
directoryPath |
The path to the file or folder that is to be accessed by this connector. |
extraPropertyStorage |
An optional string flag that specifies how this source handles "extra" properties that are not stored via file system attributes. The value should be one of the following:
|
inclusionPattern |
Optional property that specifies a regular expression that is used to help determine which files and folders in the underlying file system are exposed through this connector. The connector will expose only those files and folders with a name that matches the provided regular expression (as long as they also are not excluded by the exclusionPattern). If no inclusion pattern is specified, then the connector will include all files and folders that are not excluded via the exclusionPattern. |
exclusionPattern |
Optional property that specifies a regular expression that is used to help determine which files and folders in the underlying file system are not exposed through this connector. Files and folders with a name that matches the provided regular expression will not be exposed by this source. |
addMimeTypeMixin |
A boolean flag that specifies whether this connector should add the 'mix:mimeType' mixin to the 'nt:resource' nodes to include the 'jcr:mimeType' property. If set to true, the MIME type is computed immediately when the 'nt:resource' node is accessed, which might be expensive for larger files. This is false by default. |
readOnly |
A boolean flag that specifies whether this source can create/modify/remove files and directories on the file system to reflect changes in the JCR content. By default, sources are not read-only. |
cacheable |
Optional property that specifies if a node returned by this connector should be cached in the workspace cache or not. By default, all nodes returned by a connector are cached just like regular nodes. You may want to set this property to false if the files on the file system are changing frequently |
isQueryable |
Optional property that specifies whether or not the content exposed by this connector should be indexed by the repository. This acts as a global flag, allowing a specific connector to mark it's entire content as non-queryable. By default, all content exposed by a connector is queryable. |
pageSize |
Optional advanced property that controls the number of children that the connector should include in a single page; the default is 20. For example, if a folder contains 200 items (e.g., files or folders) and the page size is 20, then the connector will include in the document representing this folder only the properties of the folder and the first 20 items (that are readable, that satisfy the inclusion pattern, and that does not match the exclusion pattern). As additional children are needed (e.g., as the ModeShape client navigates or accesses the folder's child nodes), ModeShape will request additional pages, each with up to 20 items. |
contentBasedSha1 |
Optional advanced boolean property that controls whether the binary value's hash values are SHA-1s based upon the file contents. This property is "true" by default, and therefore has exactly the same behavior as all other binary values within the repository. The connector has to compute the SHA-1 every time a binary value is returned (including every "jcr:data" property on the "jcr:content" children of "nt:file" nodes. If the underlying files are changed by processes other than ModeShape, the computed SHA-1 may not accurately represent the changed file contents, though the time ModeShape caches the SHA-1 in the binary value is controlled as part of the connector's cacheable property. Also, computing the SHA-1 can be quite expensive and time consuming for very large files and may thus introduce a lengthy and noticeable lag when returning a "jcr:content" node until the SHA-1 is computed. If you are using very large files, consider setting this "contentBasedSha1" property to "false" so that the connector computes the SHA-1 based upon the URL to the file on the file system. Such a SHA-1 can be computed very quickly, eliminating the lag for very large files mentioned above. ModeShape still uses these SHA-1s internally in a consistent fashion (two SHA-1s will be the same only when they are for the same file), but the BinaryValue.getHash() and BinaryValue.getHexHash() methods will return this non-content-based SHA-1 value. (If you are dealing with very large binary values and are not satisfied with the speed of ModeShape dynamically computing the SHA-1, you can subclass the FileSystemConnector and override the sha1(File) method to compute/cache/lookup the SHA-1 for a given file using your preferred mechanism.) |
The FileSystemConnector offers the possibility to listen for changes to the external file system and fire events to listeners when files are added/changed/removed externally.This feature is disabled by default, but can be enabled easily by passing the "enableEvents" : true configuration setting.
By default, the file system connector will expose all of the files and folders that are underneath the specified directory and readable by the Java process, and it will allow ModeShape clients using the JCR API to change, remove, or even create new files and folders. Additionally, any "extra properties" (e.g., those that are not directly mappable to file system attributes, such as "jcr:primaryType", "jcr:created", "jcr:lastModified", and "jcr:data") will be stored not on the file system but in the same Infinispan cache that the repositories own internal (non-federated) content is stored. The connector will also use pages to efficiently work with folders with large numbers of items.
If other behavior is desired, simply set the connector's properties to non-default values. For example, if ModeShape clients are not allowed to modify, create, or remove file and folder nodes, then the connector should be configured with "readOnly" set to true. Or, if only certain files and folders are to be exposed, set the inclusionPattern and exclusionPattern to regular expressions that the connector can use to know whether to include or exclude files and folders by name. Note that any file or folder will only be exposed by the connector when the file/folder is readable and when its name satisfies the inclusionPattern and does not satisfy the exclusion pattern.
The connector is often used to expose as content in a repository the existing files and folders on the file system. Since the connector does not access any OS-specific file attributes, the connector simply maps each existing file and folder as follows:
A folder is represented in ModeShape as a node with a primary type of "nt:folder", no mixin types, and the "jcr:created" timestamp set to the last modified timestamp given by the file system. The node will contain a child for each file and folder that are to be exposed (as discussed above).
A file is represented in ModeShape as a node with a primary type of "nt:file", no mixin types, and the "jcr:created" timestamp set to the last modified timestamp given by the file system. The node will contain a single child node named "jcr:content" that represents the content of the file, and which has a primary type of "nt:resource" and the "jcr:lastModified" timestamp set to the file system's last modified timestamp for the file. If the connector is configured with "addMimeTypeMixin" set to true, then ModeShape will also attempt to determine the MIME type for the file's content and, if determined, add the "mix:mimeType" mixin and the "jcr:mimeType" property to the "jcr:content" node.
Here is a sample configuration that projects the "//a/b/c" directory onto a node the repository at "/files", with the above (default) behavior:
{ ... "externalSources" : { "local-git-repo" : { "classname" : "org.modeshape.connector.filesystem.FileSystemConnector", "directoryPath" : "/a/b/c/", "projections" : \[ "/files" \] } } ... }
Here is a slightly different configuration that is read-only, that excludes any files or folders with names that end with "{{.tmp}" (and have at least one character before this suffix), and that includes the automatically-detected MIME type:
{ ... "externalSources" : { "local-git-repo" : { "classname" : "org.modeshape.connector.filesystem.FileSystemConnector", "directoryPath" : "/a/b/c/", "projections" : \[ "/files" \], "readOnly" : true, "addMimeTypeMixin" : true, "exclusionPattern" : ".+[.]tmp$" } } ... }
Of course, some applications may want to set additional properties and/or mixins. When the connector is writable (e.g., not read-only), the connector can store these properties in one of several places, based upon the "extraPropertyStorage" configuration property. By default, these extra properties are stored in the same Infinispan cache where the ModeShape repository stores the rest of its internal (non-federated) content. This is convenient, but can lead to orphaned documents in the Infinispan cache should files and folder be removed outside of ModeShape.
Alternatively, the connector can store these extra properties on the file system. Any extra properties on a file or folder will be stored in a "sidecar" next to the corresponding file or folder and named similarly to the corresponding file or folder but with a special suffix. If stored as a JSON file, the suffix will be ".modeshape.json", or if stored as a text file the suffix will be ".modeshape. (The text format is the same that used in ModeShape 2.x, but is provided only for backward compatibility. Where possible, choose the JSON format.) Extra properties on the "jcr:content" child of "nt:file" nodes are stored in a different sidecar file, named similarly to the corresponding file but with the ".content.modeshape.json" or ".content.modeshape" suffix. Note that these sidecar files are never exposed as nodes by the connector.
It is even possible to prevent updating or creating files and folders with extra properties. To do this, simply configure the connector with the "extraPropertyStorage" property set to "none".
Here is another sample configuration for a connector that works the same as the earlier configuration except that it is now storing extra properties in a JSON sidecar:
{ ... "externalSources" : { "local-git-repo" : { "classname" : "org.modeshape.connector.filesystem.FileSystemConnector", "directoryPath" : "/a/b/c/", "projections" : \[ "/files" \], "readOnly" : true, "addMimeTypeMixin" : true, "exclusionPattern" : ".+[.]tmp$", "extraPropertyStorage" : "json" } } ... }