manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Furkan KAMACI (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-1410) Binary Attachment Data as Plain Text at Email Content
Date Sat, 15 Apr 2017 17:48:41 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15970039#comment-15970039
] 

Furkan KAMACI commented on CONNECTORS-1410:
-------------------------------------------

[~kwright@metacarta.com] This is how _body_ is already set at ManifoldCF:

{code:java}
Object o = msg.getContent();
if (o instanceof Multipart) {
  Multipart mp = (Multipart) msg.getContent();
  for (int k = 0, n = mp.getCount(); k < n; k++) {
    Part part = mp.getBodyPart(k);
    String disposition = part.getDisposition();
    if ((disposition == null)) {
      MimeBodyPart mbp = (MimeBodyPart) part;
      if (mbp.isMimeType(EmailConfig.MIMETYPE_TEXT_PLAIN)) {
        rd.addField(EmailConfig.EMAIL_BODY, mbp.getContent().toString());
      } else if (mbp.isMimeType(EmailConfig.MIMETYPE_HTML)) {
        rd.addField(EmailConfig.EMAIL_BODY, mbp.getContent().toString()); //handle html accordingly.
Returns content with html tags
      }
    }
  }
} else if (o instanceof String) {
  rd.addField(EmailConfig.EMAIL_BODY, (String)o);
}
{code}

Entire body is already read and this problem is still valid even without this improvement.
On the other hand, we just retrieve body. Previously we were streaming both body and attachments
of e-mail. So, it may be the reason why current code does not consider it as problem. 

My patch is like that as pseudo code:

{code:java}
rd.setContent(rd.getBody())
{code}

> Binary Attachment Data as Plain Text at Email Content
> -----------------------------------------------------
>
>                 Key: CONNECTORS-1410
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1410
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Email connector
>    Affects Versions: ManifoldCF 2.6
>            Reporter: Furkan KAMACI
>            Assignee: Furkan KAMACI
>             Fix For: ManifoldCF 2.8
>
>         Attachments: CONNECTORS-1410.patch
>
>
> Previously, we were indexing e-mails and its attachments together. We changed this logic
with CONNECTORS-1375 as indexing e-mail and its attachments separately.
> However, there is a problem. Content fields of emails which has attachment(s) includes
both body and attachments's binary content as plain text.
> As we index attachments separately, we can just index body as content instead of appending
email body and all attachments' binary data as plain text.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message