Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The following page has been changed by ChrisHarris:
http://wiki.apache.org/solr/ExtractingRequestHandler
The comment on the change is:
Note existence of stream.type
------------------------------------------------------------------------------
Before getting started, there are a few concepts that are helpful to understand.
+ * Tika will automatically attempt to determine the input document type (word, pdf, etc.)
and extract the content appropriately. If you want, you can explicitly specify a MIME type
for Tika wth the stream.type parameter
* Tika does everything by producing an XHTML stream that it feeds to a SAX !ContentHandler.
* Solr then implements a !SolrContentHandler which reacts to Tika's SAX events and creates
a !SolrInputDocument. You can override the !SolrContentHandler. See the section below on
Customization.
* Tika produces Metadata information according to things like !DublinCore and other specifications.
See the Tika javadocs on the Metadata class for what gets produced. <!> TODO: Link
to Tika Javadocs <!> See also http://incubator.apache.org/tika/formats.html
|