manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CONNECTORS-946) Add support for pipeline connector
Date Tue, 27 May 2014 23:14:01 GMT
Karl Wright created CONNECTORS-946:
--------------------------------------

             Summary: Add support for pipeline connector
                 Key: CONNECTORS-946
                 URL: https://issues.apache.org/jira/browse/CONNECTORS-946
             Project: ManifoldCF
          Issue Type: New Feature
          Components: Framework crawler agent
    Affects Versions: ManifoldCF 1.7
            Reporter: Karl Wright
            Assignee: Karl Wright
             Fix For: ManifoldCF 1.7


In the Amazon Search Connector, we finally found an example of an output connector that needed
to do full document processing in order to work.  This ticket represents work in the framework
to create a concept of "pipeline connector".  Pipeline connections would receive RepositoryDocument
objects, and transform them to new RepositoryDocument objects.  There would be a single important
method:

{code}
public void transformDocument(RepositoryDocument rd, ITransformationActivities activities)
throws ServiceInterruption, ManifoldCFException;
{code}

... where ITransformationActivities would include a method that would send a RepositoryDocument
object onward to either the output connection or to the next pipeline connection.

Each pipeline connection would have:
- A name
- A description
- Configuration data
- An optional prerequisite pipeline connection

Every output connection would have a new field, which is an optional prerequisite pipeline
connection.

This design is based loosely on how mapping connections and authority connections interrelate.
 An alternate design would involve having per-job specification information, but I think this
would wind up being way too complex for very little benefit, since each pipeline connection/stage
would be expected to do relatively simple/granular things, not usually involving interaction
with an external system.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message