lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl (JIRA) <j...@apache.org>
Subject [jira] Created: (SOLR-1763) Integrate Solr Cell/Tika as an UpdateRequestProcessor
Date Mon, 08 Feb 2010 20:41:28 GMT
Integrate Solr Cell/Tika as an UpdateRequestProcessor
-----------------------------------------------------

                 Key: SOLR-1763
                 URL: https://issues.apache.org/jira/browse/SOLR-1763
             Project: Solr
          Issue Type: New Feature
          Components: update
            Reporter: Jan Høydahl


>From Chris Hostetter's original post in solr-dev:

As someone with very little knowledge of Solr Cell and/or Tika, I find myself wondering if
ExtractingRequestHandler would make more sense as an extractingUpdateProcessor -- where it
could be configured to take take either binary fields (or string fields containing URLs) out
of the Documents, parse them with tika, and add the various XPath matching hunks of text back
into the document as new fields.

Then ExtractingRequestHandler just becomes a handler that slurps up it's ContentStreams and
adds them as binary data fields and adds the other literal params as fields.

Wouldn't that make things like SOLR-1358, and using Tika with URLs/filepaths in XML and CSV
based updates fairly trivial?

-Hoss

I couldn't agree more, so I decided to add it as an issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message