manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CONNECTORS-946) Add support for pipeline connector
Date Thu, 29 May 2014 12:41:01 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012328#comment-14012328
] 

Karl Wright edited comment on CONNECTORS-946 at 5/29/14 12:39 PM:
------------------------------------------------------------------

I created a branch, branches/CONNECTORS-946, to work on this ticket.

First change was the underpinnings of having sequence numbers for connections in the job UI.
 This I will want to refine over time, and maybe even change the UI so that every tab's connection
affinity is explicit.

I also created the interfaces needed to support transformation connectors.  It occurred to
me that document filtering at transformation time could well be very useful, so I set up the
connector methods to operate in a "chained" way -- each connector in the chain must decide
what to pass on down the chain (and what to respond to the caller with).

The last link in the chain is the output connection.  So anything that wants to interact with
the chain should talk only to the first transformation connection, and everything else should
be wired together.  And this raises another interesting issue: for many of the methods in
ITranslationConnector, there are equivalent methods in IOutputConnector, and there will be
code that will want to treat those methods the same.  This argues for a common lower-level
interface that contains those methods.  I will need to think about this some more.
 


was (Author: kwright@metacarta.com):
I created a branch, branches/CONNECTORS-946, to work on this ticket.

First change was the underpinnings of having sequence numbers for connections in the job UI.
 This I will want to refine over time, and maybe even change the UI so that every tab's connection
affinity is explicit.

I also created the interfaces needed to support transformation connectors.  It occurred to
me that document filtering at transformation time could well be very useful, so I set up the
connector methods to operate in a "chained" way -- each connector in the chain must decide
what to pass on down the chain (and what to respond to the caller with).

The last link in the chain is the output connection.  So anything that wants to interact with
the chain should talk only to the first translation connection, and everything else should
be wired together.  And this raises another interesting issue: for many of the methods in
ITranslationConnector, there are equivalent methods in IOutputConnector, and there will be
code that will want to treat those methods the same.  This argues for a common lower-level
interface that contains those methods.  I will need to think about this some more.
 

> Add support for pipeline connector
> ----------------------------------
>
>                 Key: CONNECTORS-946
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-946
>             Project: ManifoldCF
>          Issue Type: New Feature
>          Components: Framework crawler agent
>    Affects Versions: ManifoldCF 1.7
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 1.7
>
>
> In the Amazon Search Connector, we finally found an example of an output connector that
needed to do full document processing in order to work.  This ticket represents work in the
framework to create a concept of "pipeline connector".  Pipeline connections would receive
RepositoryDocument objects, and transform them to new RepositoryDocument objects.  There would
be a single important method:
> {code}
> public void transformDocument(RepositoryDocument rd, ITransformationActivities activities)
throws ServiceInterruption, ManifoldCFException;
> {code}
> ... where ITransformationActivities would include a method that would send a RepositoryDocument
object onward to either the output connection or to the next pipeline connection.
> Each pipeline connection would have:
> - A name
> - A description
> - Configuration data
> - An optional prerequisite pipeline connection
> Every output connection would have a new field, which is an optional prerequisite pipeline
connection.
> This design is based loosely on how mapping connections and authority connections interrelate.
 An alternate design would involve having per-job specification information, but I think this
would wind up being way too complex for very little benefit, since each pipeline connection/stage
would be expected to do relatively simple/granular things, not usually involving interaction
with an external system.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message