manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: extract email attachment
Date Tue, 07 Feb 2017 18:14:46 GMT
Hi Cihad,

The email connector is providing the attachment data unextracted to the
output connector as metadata attribute data.  There are no transformation
connectors that look at this metadata.  Solr cell also probably does not
handle binary in random metadata attributes the proper way.

The connector's attachment code therefore seems to be designed only to deal
with textual attachments.  The right solution is to have individual IDs for
each attachment.  But that would also require there to be a URL we could
construct for each attachment.  We could provide an additional URI template
for attachments, but I'd wonder if your system has the ability to serve
attachments by their own URLs.  Please let me know if this would work and
if so I can create a ticket and work on making these changes.

Thanks,
Karl


On Tue, Feb 7, 2017 at 12:56 PM, Cihad Guzel <cguzelg@gmail.com> wrote:

> Hi,
>
> I try the email connector with gmail. I attach the file [1] in my new
> email. And sent to my test email adress.
>
> My mail content body is like: "this is test mail for mfc"
>
> Then I run my email job and the email is indexed to Solr successfully.
> But, the solr's content field have not my attachment's content body. Solr
> content filed looks like:
>
> "content":" \n \n  \n  \n  \n  \n  \n  \n  \n \n  --
> 94eb2c1910841bc55f0547f43443\r\nContent-Type: multipart/alternative;
> boundary=94eb2c1910841bc5530547f43441\r\n\r\n--
> 94eb2c1910841bc5530547f43441\r\nContent-Type: text/plain;
> charset=UTF-8\r\n\r\nthis is test mail for mfc.\r\n\r\n--
> 94eb2c1910841bc5530547f43441\r\nContent-Type: text/html;
> charset=UTF-8\r\n\r\n<div dir=\"ltr\">this is test mail for
> mfc.\r\n</div>\r\n\r\n--94eb2c1910841bc5530547f43441--\r\n--
> 94eb2c1910841bc55f0547f43443\r\nContent-Type: application/pdf;
> name=\"pdf-test.pdf\"\r\nContent-Disposition: attachment;
> filename=\"pdf-test.pdf\"\r\nContent-Transfer-Encoding:
> base64\r\nX-Attachment-Id: f_iyvt78qa0\r\n\r\
> nJVBERi0xLjYNJeLjz9MNCjM3IDAgb2JqIDw8L0xpbmVhcml6ZWQgMS9MIDI
> wNTk3L08gNDAvRSAx\r\nNDExNS9OIDEvVCAxOTc5NS9IIFsgM
> TAwNSAyMTVdPj4NZW5kb2JqDSAgICAgICAgICAgICAgICAg\r\
> nDQp4cmVmDQozNyAzNA0KMDAwMDAwMDAxNiAwMDAwMCBuDQowMDAwMDAxMzg
> 2IDAwMDAwIG4NCjAw\r\nMDAwMDE1MjIgMDAwM ..."
>
> Does the MFC email connector know that the attachment's file type is pdf?
> Does not extract the contents?
>
> [1] http://www.orimi.com/pdf-test.pdf
> --
> Regards
> Cihad G├╝zel
>

Mime
View raw message