manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rafa Haro (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-962) Support multiple output connections for a single job
Date Thu, 12 Jun 2014 10:15:02 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028992#comment-14028992
] 

Rafa Haro commented on CONNECTORS-962:
--------------------------------------

Hi [~kwright@metacarta.com]. Although the use case of multiple processing pipelines with different
outputs is also quite great for us, our current approach is actually generating different
repository documents from the original one as a result of a SINGLE processing pipelines. So
rather than sending the same stream to different processing pipelines, we got different documents
representation as a result of a enhancement process, each document corresponding to a different
output connector. That is how we are doing the trick right now (although was a dirty and quick
solution). Probably you don't want to consider it right now, but Apache Camel is a great resource
for achieving the processing architecture that you seem to reach because it allows you to
easily define different processing routes like sending the same event to different pipelines

> Support multiple output connections for a single job
> ----------------------------------------------------
>
>                 Key: CONNECTORS-962
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-962
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Framework crawler agent
>    Affects Versions: ManifoldCF 1.7
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 1.7
>
>
> Zaizi has a requirement to support multiple outputs for a single job.  In theory this
requirement can be met by doing the following:
> - Allow multiple output connections, and multiple pipelines, per job
> - Keep a distinct ingeststatus record for each document/output combination
> - Modify WorkerThread to call IncrementalIndexer multiple times for every document fetched
> Places where different things need to happen are:
> - RepositoryDocument - because one binary stream will not do for multiple outputs
> - UI, obviously, because there will need to be multiple pipelines, not just one, and
in addition it would be probably important to be able to "split" the pipeline at arbitrary
points



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message