lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl (Commented) (JIRA) <j...@apache.org>
Subject [jira] [Commented] (SOLR-2842) Re-factor UpdateChain and UpdateProcessor interfaces
Date Sun, 16 Oct 2011 14:42:11 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13128413#comment-13128413
] 

Jan Høydahl commented on SOLR-2842:
-----------------------------------

Yep, for distrib cloud stuff it would be cool to be able to have dedicated doc processor nodes.

I don't think the client necessarily needs to be THAT fat or complex if this is done right.
If we make the UpdateChain and the Processor itself more stand-alone, not depending on SolrCore,
and make updateChains easily configurable outside of solrconfig.xml (see SOLR-2841), then
it would be straight-forward to instansiate a chain on the client side, without the RunUpdateProcessor
of course. Some processors use Schema, so we'd perhaps need a way to fetch the correct schema
from the server, using admin/file or even better, ZK.
                
> Re-factor UpdateChain and UpdateProcessor interfaces
> ----------------------------------------------------
>
>                 Key: SOLR-2842
>                 URL: https://issues.apache.org/jira/browse/SOLR-2842
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Jan Høydahl
>
> The UpdateChain's main task is to send SolrInputDocuments through a chain of UpdateRequestProcessors
in order to transform them in some way and then (typically) indexing them.
> This generic "pipeline" concept would also be useful on the client side (SolrJ), so that
we could choose to do parts or all of the processing on the client. The most prominent use
case is extracting text (Tika) from large binary documents, residing on local storage on the
client(s). Streaming hundreds of Mb over to Solr for processing is not efficcient. See SOLR-1526.
> We're already implementing Tika as an UpdateProcessor in SOLR-1763, and what would be
more natural than reusing this - and any other processor - on the client side?
> However, for this to be possible, some interfaces need to change slightly..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message