Skip to end of metadata
Go to start of metadata

The text sequencers extract data from text streams. There are separate sequencers for character-delimited sequencing and fixed width sequencing, but both treat the incoming text stream as a series of rows (separated by line-terminators, as defined in BufferedReader.readLine() with each row consisting of one or more columns. As noted above, each text sequencer provides its own mechanism for splitting the row into columns.

The AbstractTextSequencer class provides a number of JavaBean properties that are common to both of the concrete text sequencer classes:

Property Description
commentMarker Optional property that, if set, indicates that any line beginning with exactly this string should be treated as a comment and should not be processed further. If this value is null, then all lines will be sequenced. The default value for this property is null.
maximumLinesToRead Optional property that, if set, limits the number of lines that will be read during sequencing. Additional lines will be ignored. If this value is non-positive, all lines will be read and sequenced. Comment lines are not counted towards this total. The default value of this property is -1 (indicating that all lines should be read and sequenced).
rowFactoryClassName Optional property that, if set, provides the name of a class that provides a custom implementation of the RowFactory interface. This class must have a no-argument, public constructor. If set, an instance of this class will be created each time that the sequencer sequences an input stream and will be used to provide the output structure of the graph. If this property is set to null, a default implementation will be used. The default value of this property is null.

AbstractTextSequencer properties

The default row factory creates one node in the output location for each row sequenced from the source and adds each column with the row as a child node of the row node. The output graph takes the following form (all nodes have primary type nt:unstructured:

Delimited Text SequencerThe DelimitedTextSequencer splits rows into columns based on a regular expression pattern. Although the default pattern is a comma, any regular expression can be provided allowing for more sophisticated splitting patterns.

The DelimitedTextSequencer class provides an additional JavaBean property to override the default regular expression pattern:

Property Description
splitPattern Optional property that, if set, sets the regular expression pattern that is used to split each row into columns. This property may not be set to null and defaults to ",".

DelimitedTextSequencer properties

To use this sequencer, simply include the modeshape-sequencer-text JAR in your application and configure the JcrConfiguration to use this sequencer using something similar to:

Fixed Width Text SequencerThe FixedWidthTextSequencer splits rows into columns based on predefined positions. The default setting is to have a single column per row. It also provides an additional JavaBean property to override the default start positions for each column.

Property Description
columnStartPositions Optional property that, if set, provides the start position of each column after the first. The start positions are concatenated into a single, comma-delimited string. The default value is the empty string (implying that each row should be treated as a single column). This property may not be set to null. There is an implicit column start position of 0 that never needs to be specified.

FixedWidthTextSequencer properties

To use this sequencer, simply include the modeshape-sequencer-text JAR in your application and configure the JcrConfiguration to use this sequencer using something similar to:

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.