manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CONNECTORS-1433) Add CLI options to pipeline modules, e.g. allow Tika to export TEXT, not BASE64
Date Thu, 22 Jun 2017 21:57:00 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060051#comment-16060051
] 

Karl Wright edited comment on CONNECTORS-1433 at 6/22/17 9:56 PM:
------------------------------------------------------------------

This part of the connector code doesn't seem to be working:
(ElasticSearchIndex.java line 209):

{code}
       if (!useMapperAttachments && inputStream != null) {
          if (contentAttributeName != null)
          {
            Reader r = new InputStreamReader(inputStream, Consts.UTF_8);
            StringBuilder sb = new
StringBuilder((int)document.getBinaryLength());
            char[] buffer = new char[65536];
            while (true)
            {
              int amt = r.read(buffer,0,buffer.length);
              if (amt == -1)
                break;
              sb.append(buffer,0,amt);
            }
            needComma = writeField(pw, needComma, contentAttributeName, new
String[]{sb.toString()});
          }
        }
{code}

I have a value for contentAttributeName:
[image: Inline image 1]
and according to the code, it should be writing a string to the Content
field name field.



was (Author: svanschalkwyk):
This part of the connector code doesn't seem to be working:
(ElasticSearchIndex.java line 209):
"       if (!useMapperAttachments && inputStream != null) {
          if (contentAttributeName != null)
          {
            Reader r = new InputStreamReader(inputStream, Consts.UTF_8);
            StringBuilder sb = new
StringBuilder((int)document.getBinaryLength());
            char[] buffer = new char[65536];
            while (true)
            {
              int amt = r.read(buffer,0,buffer.length);
              if (amt == -1)
                break;
              sb.append(buffer,0,amt);
            }
            needComma = writeField(pw, needComma, contentAttributeName, new
String[]{sb.toString()});
          }
        }
I have a value for contentAttributeName:
[image: Inline image 1]
and according to the code, it should be writing a string to the Content
field name field.


> Add CLI options to pipeline modules, e.g. allow Tika to export TEXT, not BASE64
> -------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-1433
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1433
>             Project: ManifoldCF
>          Issue Type: Wish
>          Components: Tika extractor
>            Reporter: Steph van Schalkwyk
>            Assignee: Karl Wright
>         Attachments: CONNECTORS-1433.patch, image.png, image.png
>
>
> Would love to have Tika spout TEXT, not BASE64.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message