The XSD sequencer included in ModeShape can parse XML Schema Documents that adhere to the W3C's XML Schema Part 1 and Part 2 specifications, and output a representation of the XSD's attribute declarations, element declarations, simple type definitions, complex type definitions, import statements, include statements, attribute group declarations, annotations, other components, and even attributes with a non-schema namespace. This derived information is intended to accurately reflect the structure and semantics of the XSD files while also making it possible for ModeShape users to easily navigate, query and search over this derived information. This sequencer captures the namespace and names of all referenced components, and will resolve references to components appearing within the same files.
The design of this sequencer and it's output structure have been influenced by the SOA Repository Artifact Model and Protocol (S-RAMP) draft specification, which is currently under development as an OASIS Technology Committee. S-RAMP defines a model for a variety of file types, including WSDL and XSD. This sequencer's output was designed to mirror that model, and thus some of the properties and node types used are defined within the "sramp" namespace.
The XML Schema specification is powerful, flexible, rich, and complicated. This means that many XML Schema Documents themselves are complicated. But it also means that there is a lot of variation in XSDs, and consequently there is a lot of variation in the output structure that this sequencer derives from XSD files.
ExampleSo before we get too far, let's look at an example XML Schema Document taken from the XML Schema Primer:
This schema defines the structure of several XML elements used to represent purchase orders, and describes an XML document such as the following:
The XSD sequencer will derive the following content from the above XSD:
The first thing to note is that the sequencer produces a node of type xs:schemaDocument that includes the mode:derived information (e.g., the time of sequencing and the path to the file from which this information was derived), information about the XSD itself, plus an sramp:description property containing the documentation content from any annotations directly under the schema element in the XSD.
Secondly, there is a node for each top-level element declaration, namely "purchaseOrder" and "comment", with properties capturing the element's name, namespace (not shown since there is no target namespace for the schema), and XSD type name, namespace and reference. The "comment" element declaration has a base type of "xs:string", whereas the "purchaseOrder" element declaration has a type of "PurchaseOrderType" (defined later in the XSD and in the derived content). Each node is "mix:referenceable" and has a jcr:uuid property, allowing the "purchaseOrder" element declaration to have a "xs:type" REFERENCE property pointing to the "PurchaseOrderType" complex type definition node.
There are also nodes representing each of the global complex type definitions, including "PurchaseOrderType", "USAddress", "Items", and "SKU". Each of these nodes has properties representing the complex type's features (such as abstract, mixed, name, etc.), as well as child nodes that represent the definition of the complex type's content (e.g., sequence, choice, all, simple content, complex content, etc.).
This example shows some of the structure that this sequencer derives from the XML Schema Documents. Our goal for this sequencer was to output content that reflected as accurately as possible the structure of the XML Schema Documents while also making the content easy to navigate, search and query.
Node TypesThe XSD sequencer follows JCR best-practices by defining all nodes to have a primary type that allows any single or multi-valued property, meaning it's possible and valid for any node to have any property (with single or multiple values). In fact, this feature is used when XSD files contain attributes with non-schema namespaces, which are then mapped onto properties with the attributes name and possibly-empty namespace. However, it is still useful to capture the metadata about what that node represents, and so the sequencer use explicit node type definitions and mixins for this.
The compact node definitions for the "xs" namespace are as follows:
These types use some of the node types and mixins defined in the "sramp" namespace:
ConfigurationTo use this sequencer, simply include the appropriate version of the Maven artifact with a "org.modeshape" group ID and "modeshape-sequencer-xsd" artifact ID. Or, if you're using JAR files and manually setting up the classpath for your application, use the "modeshape-sequencer-xsd-2.7.0.Final-jar-with-dependencies.jar" file. Then, define a sequencing configuration in the ModeShape configuration, using something similar to:
or using the JcrConfiguration: