manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-962) Support multiple output connections for a single job
Date Tue, 17 Jun 2014 17:50:04 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14034105#comment-14034105
] 

Karl Wright commented on CONNECTORS-962:
----------------------------------------

Hi Rafa,

The documentation has now been updated, and the tests pass as well (both UI and IT).  I think
it is safe to try things out and see if they work for you.

Longer term, the UI for editing the pipeline is clumsy.  It's correct in that it only gives
you the buttons necessary for doing what you are allowed to do, but it can be done better.
 Over the next week I'll think about how to improve it.  Any suggestions you and your team
have for a better organization would be welcome too.

Aside from that, there are no real downsides to this implementation, and it is fully incremental,
so please try it out and don't be afraid to try and break it.  I'd be quite interested in
finding out if it meets your needs and can reduce your dependence on custom code.

Thanks!

> Support multiple output connections for a single job
> ----------------------------------------------------
>
>                 Key: CONNECTORS-962
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-962
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Framework crawler agent
>    Affects Versions: ManifoldCF 1.7
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 1.7
>
>
> Zaizi has a requirement to support multiple outputs for a single job.  In theory this
requirement can be met by doing the following:
> - Allow multiple output connections, and multiple pipelines, per job
> - Keep a distinct ingeststatus record for each document/output combination
> - Modify WorkerThread to call IncrementalIndexer multiple times for every document fetched
> Places where different things need to happen are:
> - RepositoryDocument - because one binary stream will not do for multiple outputs
> - UI, obviously, because there will need to be multiple pipelines, not just one, and
in addition it would be probably important to be able to "split" the pipeline at arbitrary
points



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message