lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Solr Wiki] Update of "ExtractingRequestHandler" by GrantIngersoll
Date Fri, 14 Nov 2008 17:23:20 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The following page has been changed by GrantIngersoll:

New page:

This document currently represents uncommitted code.  Please see [
SOLR-284] for more information.

= Introduction =

A common need of users is the ability to ingest binary and/or structured documents such as
Office, PDF and other proprietary formats.  The [ Apache
Tika] project provides a framework for wrapping many different file format parsers, such as
PDFBox, POI and others.

The !ExtractingRequestHandler will provide a wrapper around Tika to allow uses to upload binary
files to Solr and have Solr extract text from it and then index it.

= Features =

= Summary of Input Parameters =

View raw message