manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: extract email attachment
Date Tue, 07 Feb 2017 19:36:05 GMT
I've created a ticket and attached a patch to it.  CONNECTORS-1375.  Please
let me know if it works for you; if not, I'll fix what doesn't work.

Karl


On Tue, Feb 7, 2017 at 1:19 PM, Karl Wright <daddywri@gmail.com> wrote:

> Correction: the only metadata attribute we set is the attachment(s)
> mimetype (as a multivalued field) -- this doesn't currently include the
> attachment data.
>
> Karl
>
>
> On Tue, Feb 7, 2017 at 1:14 PM, Karl Wright <daddywri@gmail.com> wrote:
>
>> Hi Cihad,
>>
>> The email connector is providing the attachment data unextracted to the
>> output connector as metadata attribute data.  There are no transformation
>> connectors that look at this metadata.  Solr cell also probably does not
>> handle binary in random metadata attributes the proper way.
>>
>> The connector's attachment code therefore seems to be designed only to
>> deal with textual attachments.  The right solution is to have individual
>> IDs for each attachment.  But that would also require there to be a URL we
>> could construct for each attachment.  We could provide an additional URI
>> template for attachments, but I'd wonder if your system has the ability to
>> serve attachments by their own URLs.  Please let me know if this would work
>> and if so I can create a ticket and work on making these changes.
>>
>> Thanks,
>> Karl
>>
>>
>> On Tue, Feb 7, 2017 at 12:56 PM, Cihad Guzel <cguzelg@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I try the email connector with gmail. I attach the file [1] in my new
>>> email. And sent to my test email adress.
>>>
>>> My mail content body is like: "this is test mail for mfc"
>>>
>>> Then I run my email job and the email is indexed to Solr successfully.
>>> But, the solr's content field have not my attachment's content body. Solr
>>> content filed looks like:
>>>
>>> "content":" \n \n  \n  \n  \n  \n  \n  \n  \n \n
>>>  --94eb2c1910841bc55f0547f43443\r\nContent-Type: multipart/alternative;
>>> boundary=94eb2c1910841bc5530547f43441\r\n\r\n--94eb2c1910841
>>> bc5530547f43441\r\nContent-Type: text/plain; charset=UTF-8\r\n\r\nthis
>>> is test mail for mfc.\r\n\r\n--94eb2c1910841bc5530547f43441\r\nContent-Type:
>>> text/html; charset=UTF-8\r\n\r\n<div dir=\"ltr\">this is test mail for
>>> mfc.\r\n</div>\r\n\r\n--94eb2c1910841bc5530547f43441--\r\n--
>>> 94eb2c1910841bc55f0547f43443\r\nContent-Type: application/pdf;
>>> name=\"pdf-test.pdf\"\r\nContent-Disposition: attachment;
>>> filename=\"pdf-test.pdf\"\r\nContent-Transfer-Encoding:
>>> base64\r\nX-Attachment-Id: f_iyvt78qa0\r\n\r\nJVBERi0xLjY
>>> NJeLjz9MNCjM3IDAgb2JqIDw8L0xpbmVhcml6ZWQgMS9MIDIwNTk3L08gNDA
>>> vRSAx\r\nNDExNS9OIDEvVCAxOTc5NS9IIFsgMTAwNSAyMTVdPj4NZW5kb2J
>>> qDSAgICAgICAgICAgICAgICAg\r\nDQp4cmVmDQozNyAzNA0KMDAwMDAwMDA
>>> xNiAwMDAwMCBuDQowMDAwMDAxMzg2IDAwMDAwIG4NCjAw\r\nMDAwMDE1MjIgMDAwM ..."
>>>
>>> Does the MFC email connector know that the attachment's file type is
>>> pdf? Does not extract the contents?
>>>
>>> [1] http://www.orimi.com/pdf-test.pdf
>>> --
>>> Regards
>>> Cihad G├╝zel
>>>
>>
>>
>

Mime
View raw message