lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl (JIRA) <j...@apache.org>
Subject [jira] [Commented] (SOLR-1763) Integrate Solr Cell/Tika as an UpdateRequestProcessor
Date Wed, 27 Jun 2012 11:41:44 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402149#comment-13402149
] 

Jan Høydahl commented on SOLR-1763:
-----------------------------------

I won't have time to look at this before october-ish, so anyone feel free to give it a shot
:)
                
> Integrate Solr Cell/Tika as an UpdateRequestProcessor
> -----------------------------------------------------
>
>                 Key: SOLR-1763
>                 URL: https://issues.apache.org/jira/browse/SOLR-1763
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>              Labels: extracting_request_handler, solr_cell, tika, update_request_handler
>
> From Chris Hostetter's original post in solr-dev:
> As someone with very little knowledge of Solr Cell and/or Tika, I find myself wondering
if ExtractingRequestHandler would make more sense as an extractingUpdateProcessor -- where
it could be configured to take take either binary fields (or string fields containing URLs)
out of the Documents, parse them with tika, and add the various XPath matching hunks of text
back into the document as new fields.
> Then ExtractingRequestHandler just becomes a handler that slurps up it's ContentStreams
and adds them as binary data fields and adds the other literal params as fields.
> Wouldn't that make things like SOLR-1358, and using Tika with URLs/filepaths in XML and
CSV based updates fairly trivial?
> -Hoss
> I couldn't agree more, so I decided to add it as an issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message