Open Office files - ModeShape 5

This is available starting with 5.1

A sequencer which supports various Open Office file formats, for which it can extract the following metadata information:

//------------------------------------------------------------------------------
// N A M E S P A C E S
//------------------------------------------------------------------------------
<jcr='http://www.jcp.org/jcr/1.0'>
<nt='http://www.jcp.org/jcr/nt/1.0'>
<mix='http://www.jcp.org/jcr/mix/1.0'>
<odf='http://www.modeshape.org/odf/1.0'>

//------------------------------------------------------------------------------
// N O D E T Y P E S
//------------------------------------------------------------------------------

[odf:metadata] > nt:unstructured, mix:mimeType
 - odf:creationDate (date)
 - odf:creator (string)
 - odf:description (string)
 - odf:editingCycles (long)
 - odf:editingTime (long)
 - odf:generator (string)
 - odf:initialCreator (string)
 - odf:keywords (string) *
 - odf:language (string)
 - odf:modificationDate (date)
 - odf:printedBy (string)
 - odf:printDate (date)
 - odf:subject (string)
 - odf:title (string)
 - odf:pages (long)    // text and presentations
 - odf:sheets (long)   // spreadsheets

You can configure it in embedded mode like so:

{
    "name" : "OpenDocument Format Sequencer Test Repository",
    "sequencing" : {
        "sequencers" : {
            "OpenDocument Format sequencer" :  {
                "classname" : "odf",
                "pathExpressions" : ["default://(*.(odt|ods|odp|odg|odc|ott|ots|otp|otg|otc))/jcr:content[@jcr:data] => default:/sequenced/odf" ]
            }
        }
    }
}

or in JBoss AS like so:

<sequencer name="odf-sequencer" classname="odf" module="org.modeshape.sequencer.odf">
  <path-expression>/files(//*.(odt|ods|odp|odg|odc|ott|ots|otp|otg|otc)[*])/jcr:content[@jcr:data] => /derived/odf/$1</path-expression>
</sequencer>