manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CONNECTORS-946) Add support for pipeline connector
Date Wed, 28 May 2014 18:03:02 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14011382#comment-14011382
] 

Karl Wright edited comment on CONNECTORS-946 at 5/28/14 6:03 PM:
-----------------------------------------------------------------

The tab issue is a little tricky.  Tabs from multiple connectors (and also the names of form
elements) must coexist together within a given job, and it is perfectly possible with a pipeline
for the same tab names to appear more than once.

A solution for that is to allow every connector a chance to contextualize its specification
UI, by passing in a "context string".  The context string *could* be just the name of the
connection, but that may be too unwieldy.  So, instead, I propose that the context string
be a simple integer.  The integer would be assigned in order, as follows:

- Repository connection would be "1_" (or "1:")
- Translation connections would be "2_" (or "2:") through "<N>_" (or "<N>:")
- Output connection would be "<N+1>_" (or "<N+1>:")

This proposal would also replace an earlier ticket (will look for it) which requests that
output connections prefix form variables and tab names with short unique prefix strings.

A variant of this would pass the sequence number in as an explicit argument, and the UI code
would need to match both the tab name and the sequence number in order to display a non-hidden
tab.  This would have the advantage of making it possible for the framework to do most of
the work.  The SelectTab() javascript method would also need to know the sequence number in
order to work properly.  I'll need to work out an example.


was (Author: kwright@metacarta.com):
The tab issue is a little tricky.  Tabs from multiple connectors (and also the names of form
elements) must coexist together within a given job, and it is perfectly possible with a pipeline
for the same tab names to appear more than once.

A solution for that is to allow every connector a chance to contextualize its specification
UI, by passing in a "context string".  The context string *could* be just the name of the
connection, but that may be too unwieldy.  So, instead, I propose that the context string
be a simple integer.  The integer would be assigned in order, as follows:

- Repository connection would be "1_" (or "1:")
- Translation connections would be "2_" (or "2:") through "<N>_" (or "<N>:")
- Output connection would be "<N+1>_" (or "<N+1>:")

This proposal would also replace an earlier ticket (will look for it) which requests that
output connections prefix form variables and tab names with short unique prefix strings.

prepended to all form variables and tab names

> Add support for pipeline connector
> ----------------------------------
>
>                 Key: CONNECTORS-946
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-946
>             Project: ManifoldCF
>          Issue Type: New Feature
>          Components: Framework crawler agent
>    Affects Versions: ManifoldCF 1.7
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 1.7
>
>
> In the Amazon Search Connector, we finally found an example of an output connector that
needed to do full document processing in order to work.  This ticket represents work in the
framework to create a concept of "pipeline connector".  Pipeline connections would receive
RepositoryDocument objects, and transform them to new RepositoryDocument objects.  There would
be a single important method:
> {code}
> public void transformDocument(RepositoryDocument rd, ITransformationActivities activities)
throws ServiceInterruption, ManifoldCFException;
> {code}
> ... where ITransformationActivities would include a method that would send a RepositoryDocument
object onward to either the output connection or to the next pipeline connection.
> Each pipeline connection would have:
> - A name
> - A description
> - Configuration data
> - An optional prerequisite pipeline connection
> Every output connection would have a new field, which is an optional prerequisite pipeline
connection.
> This design is based loosely on how mapping connections and authority connections interrelate.
 An alternate design would involve having per-job specification information, but I think this
would wind up being way too complex for very little benefit, since each pipeline connection/stage
would be expected to do relatively simple/granular things, not usually involving interaction
with an external system.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message