|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.modeshape.extractor.tika.TikaTextExtractor
public class TikaTextExtractor
A TextExtractor
that uses the Apache Tika library.
This extractor will automatically discover all of the Tika Parser
implementations that are defined in
META-INF/services/org.apache.tika.parser.Parser
text files accessible via the current classloader and that contain
the class names of the Parser implementations (one class name per line in each file).
This text extractor can be configured in a ModeShape configuration by specifying several optional properties:
package files
are excluded, though explicitly setting any excluded MIME types will
override these default.
Field Summary | |
---|---|
static Set<String> |
DEFAULT_EXCLUDED_MIME_TYPES
The MIME types that are excluded by default. |
Constructor Summary | |
---|---|
TikaTextExtractor()
|
Method Summary | |
---|---|
void |
addExcludedMimeType(String excludedMimeType)
Add another MIME type that should be excluded. |
void |
addIncludedMimeType(String includedMimeType)
Add another MIME type that should be excluded. |
void |
excludeMimeType(String mimeType)
Exclude the MIME type from extraction. |
void |
extractFrom(InputStream stream,
TextExtractorOutput output,
TextExtractorContext context)
Sequence the data found in the supplied stream, placing the output information into the supplied map. |
Set<String> |
getExcludedMimeTypes()
Set the MIME types that should be excluded. |
Set<String> |
getIncludedMimeTypes()
Get the MIME types that are explicitly requested to be included. |
void |
includeMimeType(String mimeType)
Include the MIME type from extraction. |
protected org.apache.tika.parser.DefaultParser |
initialize()
This class lazily initializes the DefaultParser instance. |
void |
setExcludedMimeTypes(String excludedMimeTypes)
Set the MIME types that should be excluded. |
void |
setIncludedMimeTypes(String includedMimeTypes)
Set the MIME types that should be included. |
boolean |
supportsMimeType(String mimeType)
Determine if this extractor is capable of processing content with the supplied MIME type. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final Set<String> DEFAULT_EXCLUDED_MIME_TYPES
Constructor Detail |
---|
public TikaTextExtractor()
Method Detail |
---|
public boolean supportsMimeType(String mimeType)
supportsMimeType
in interface TextExtractor
mimeType
- the MIME type; never null
TextExtractor.supportsMimeType(java.lang.String)
public void extractFrom(InputStream stream, TextExtractorOutput output, TextExtractorContext context) throws IOException
ModeShape's SequencingService determines the sequencers that should be executed by monitoring the changes to one or more workspaces that it is monitoring. Changes in those workspaces are aggregated and used to determine which sequencers should be called. If the sequencer implements this interface, then this method is called with the property that is to be sequenced along with the interface used to register the output. The framework takes care of all the rest.
extractFrom
in interface TextExtractor
stream
- the stream with the data to be sequenced; never null
output
- the output from the sequencing operation; never null
context
- the context for the sequencing operation; never null
IOException
- if there is a problem reading the streamTextExtractor.extractFrom(java.io.InputStream, org.modeshape.graph.text.TextExtractorOutput,
org.modeshape.graph.text.TextExtractorContext)
protected org.apache.tika.parser.DefaultParser initialize()
DefaultParser
instance.
parser
public Set<String> getIncludedMimeTypes()
public void setIncludedMimeTypes(String includedMimeTypes)
includedMimeTypes
- the whitespace-delimited or comma-separated list of MIME types that are to be includedpublic void addIncludedMimeType(String includedMimeType)
includedMimeType
- the MIME type that is to be includedpublic void includeMimeType(String mimeType)
mimeType
- MIME type that should be includedpublic Set<String> getExcludedMimeTypes()
public void setExcludedMimeTypes(String excludedMimeTypes)
excludedMimeTypes
- the whitespace-delimited or comma-separated list of MIME types that are to be excludedpublic void addExcludedMimeType(String excludedMimeType)
excludedMimeType
- the MIME type that is to be excludedpublic void excludeMimeType(String mimeType)
mimeType
- MIME type that should be excluded
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |