manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-954) Amazon Cloud Search connector's use of Tika should be revisited after pipelines are added
Date Thu, 19 Jun 2014 00:39:25 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14036739#comment-14036739
] 

Karl Wright commented on CONNECTORS-954:
----------------------------------------

Added the field mapping tab: r1603687

Tomorrow will revamp the amazon connector to remove the tika transformer within.
Still unanswered: (a) whether there's a good way to stream the extracted content to Amazon,
and (b) how to remove newline characters, as is currently done.  Ideally, we'd construct the
JSON on the fly, but I don't know how realistic that would be.  Also, quoting may need to
be addressed.


> Amazon Cloud Search connector's use of Tika should be revisited after pipelines are added
> -----------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-954
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-954
>             Project: ManifoldCF
>          Issue Type: Task
>          Components: Amazon CloudSearch output connector
>    Affects Versions: ManifoldCF 1.7
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 1.7
>
>
> Amazon Cloud Search connector uses Tika to extract content from binaries.
> When the pipeline support in CONNECTORS-946 is committed to trunk, we should do two things:
> (a) Create a Transformation Connection that extracts binary data into metadata, and
> (b) Remove the Tika dependency from the Amazon connector



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message