lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "ExtractingRequestHandler" by GrantIngersoll
Date Fri, 14 Nov 2008 17:23:20 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The following page has been changed by GrantIngersoll:
http://wiki.apache.org/solr/ExtractingRequestHandler

New page:
[[TableOfContents]]

This document currently represents uncommitted code.  Please see [https://issues.apache.org/jira/browse/SOLR-284
SOLR-284] for more information.

= Introduction =

A common need of users is the ability to ingest binary and/or structured documents such as
Office, PDF and other proprietary formats.  The [http://www.lucene.apache.org/tika Apache
Tika] project provides a framework for wrapping many different file format parsers, such as
PDFBox, POI and others.

The !ExtractingRequestHandler will provide a wrapper around Tika to allow uses to upload binary
files to Solr and have Solr extract text from it and then index it.

= Features =




= Summary of Input Parameters =



Mime
View raw message