manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-1433) Add CLI options to pipeline modules, e.g. allow Tika to export TEXT, not BASE64
Date Wed, 21 Jun 2017 08:22:00 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057167#comment-16057167
] 

Karl Wright commented on CONNECTORS-1433:
-----------------------------------------

We've seriously not had any issue with Tika output format so far, and Tika has been integrated
for several years.  Tests on documents such as excel spreadsheets, PDFs, text files, and word
documents heretofore have yielded text output, not base64.  I don't know what has changed,
if anything, but if these formats now generate base64 it sounds like a contract change of
some kind that we missed?  


> Add CLI options to pipeline modules, e.g. allow Tika to export TEXT, not BASE64
> -------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-1433
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1433
>             Project: ManifoldCF
>          Issue Type: Wish
>          Components: Tika extractor
>            Reporter: Steph van Schalkwyk
>
> Would love to have Tika spout TEXT, not BASE64.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message