manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-1591) RTF comment parsing problem
Date Mon, 11 Mar 2019 15:42:00 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16789712#comment-16789712
] 

Karl Wright commented on CONNECTORS-1591:
-----------------------------------------

[~zfarago] When you run a ManifoldCF job that fetches an RTF document and runs it through
the Tika extractor, what comes out is a stream of characters (the content stream) plus various
metadata fields.  All of these are sent to the output connector, which then does whatever
it wants with these.

You *cannot* see the content stream nor the metadata directly.  So I need to know where you
are getting result.txt from.  There is a missing step that you aren't telling me about and
it's a critical one.


> RTF comment parsing problem
> ---------------------------
>
>                 Key: CONNECTORS-1591
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1591
>             Project: ManifoldCF
>          Issue Type: Bug
>            Reporter: Zoltan Farago
>            Priority: Major
>         Attachments: comment.rtf, result.txt
>
>
> We have a problem with Manifold/Tika. When a comment is parsed from and RTF file, the
result has no separator. see attachments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message