manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: extract email attachment
Date Tue, 07 Feb 2017 18:19:57 GMT
Correction: the only metadata attribute we set is the attachment(s)
mimetype (as a multivalued field) -- this doesn't currently include the
attachment data.

Karl


On Tue, Feb 7, 2017 at 1:14 PM, Karl Wright <daddywri@gmail.com> wrote:

> Hi Cihad,
>
> The email connector is providing the attachment data unextracted to the
> output connector as metadata attribute data.  There are no transformation
> connectors that look at this metadata.  Solr cell also probably does not
> handle binary in random metadata attributes the proper way.
>
> The connector's attachment code therefore seems to be designed only to
> deal with textual attachments.  The right solution is to have individual
> IDs for each attachment.  But that would also require there to be a URL we
> could construct for each attachment.  We could provide an additional URI
> template for attachments, but I'd wonder if your system has the ability to
> serve attachments by their own URLs.  Please let me know if this would work
> and if so I can create a ticket and work on making these changes.
>
> Thanks,
> Karl
>
>
> On Tue, Feb 7, 2017 at 12:56 PM, Cihad Guzel <cguzelg@gmail.com> wrote:
>
>> Hi,
>>
>> I try the email connector with gmail. I attach the file [1] in my new
>> email. And sent to my test email adress.
>>
>> My mail content body is like: "this is test mail for mfc"
>>
>> Then I run my email job and the email is indexed to Solr successfully.
>> But, the solr's content field have not my attachment's content body. Solr
>> content filed looks like:
>>
>> "content":" \n \n  \n  \n  \n  \n  \n  \n  \n \n
>>  --94eb2c1910841bc55f0547f43443\r\nContent-Type: multipart/alternative;
>> boundary=94eb2c1910841bc5530547f43441\r\n\r\n--94eb2c1910841
>> bc5530547f43441\r\nContent-Type: text/plain; charset=UTF-8\r\n\r\nthis
>> is test mail for mfc.\r\n\r\n--94eb2c1910841bc5530547f43441\r\nContent-Type:
>> text/html; charset=UTF-8\r\n\r\n<div dir=\"ltr\">this is test mail for
>> mfc.\r\n</div>\r\n\r\n--94eb2c1910841bc5530547f43441--\r\n--
>> 94eb2c1910841bc55f0547f43443\r\nContent-Type: application/pdf;
>> name=\"pdf-test.pdf\"\r\nContent-Disposition: attachment;
>> filename=\"pdf-test.pdf\"\r\nContent-Transfer-Encoding:
>> base64\r\nX-Attachment-Id: f_iyvt78qa0\r\n\r\nJVBERi0xLjY
>> NJeLjz9MNCjM3IDAgb2JqIDw8L0xpbmVhcml6ZWQgMS9MIDIwNTk3L08gNDA
>> vRSAx\r\nNDExNS9OIDEvVCAxOTc5NS9IIFsgMTAwNSAyMTVdPj4NZW5kb2J
>> qDSAgICAgICAgICAgICAgICAg\r\nDQp4cmVmDQozNyAzNA0KMDAwMDAwMDA
>> xNiAwMDAwMCBuDQowMDAwMDAxMzg2IDAwMDAwIG4NCjAw\r\nMDAwMDE1MjIgMDAwM ..."
>>
>> Does the MFC email connector know that the attachment's file type is pdf?
>> Does not extract the contents?
>>
>> [1] http://www.orimi.com/pdf-test.pdf
>> --
>> Regards
>> Cihad G├╝zel
>>
>
>

Mime
View raw message