Here's the full code for this class:

https://svn.apache.org/repos/asf/manifoldcf/trunk/connectors/email/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/email/EmailConnector.java

Karl


On Tue, Feb 7, 2017 at 5:14 PM, Karl Wright <daddywri@gmail.com> wrote:
Hi Cihad,

The variable attachmentIndex is *supposed* to be null except when an attachment is being processed.  The code should look like this:

        if (attachmentIndex == null) {
          // It's an email
...
        } else {
          // It's an attachment
          attachmentNumber = attachmentIndex;
...
        }


Karl


On Tue, Feb 7, 2017 at 4:43 PM, Cihad Guzel <cguzelg@gmail.com> wrote:
Hi Karl,

I added LOG line for testing. It looks attachmentIndex is null.

2017-02-08 0:11 GMT+03:00 Karl Wright <daddywri@gmail.com>:
I attached a second patch (to apply on top of the first patch).  Please let me know if that fixes the issue.

Karl


On Tue, Feb 7, 2017 at 3:59 PM, Cihad Guzel <cguzelg@gmail.com> wrote:
Hi Karl,

I have an error as follow:

FATAL 2017-02-07 23:56:09,483 (Worker thread '29') - Error tossed: For input string: "myFolder/test:<CADNgPDgSXHeWo0GDnUL6S2sogUsXUa9mx2WxOT23Wi37Hog5Gw@mail.gmail.com>"
java.lang.NumberFormatException: For input string: "myFolder/test:<CADNgPDgSXHeWo0GDnUL6S2sogUsXUa9mx2WxOT23Wi37Hog5Gw@mail.gmail.com>"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Integer.parseInt(Integer.java:580)
        at java.lang.Integer.parseInt(Integer.java:615)
        at org.apache.manifoldcf.crawler.connectors.email.EmailConnector.processDocuments(EmailConnector.java:705)
        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)


2017-02-07 22:50 GMT+03:00 Cihad Guzel <cguzelg@gmail.com>:
Thanks Karl,

I will try it.

Regards
Cihad Guzel

2017-02-07 22:36 GMT+03:00 Karl Wright <daddywri@gmail.com>:
I've created a ticket and attached a patch to it.  CONNECTORS-1375.  Please let me know if it works for you; if not, I'll fix what doesn't work.

Karl


On Tue, Feb 7, 2017 at 1:19 PM, Karl Wright <daddywri@gmail.com> wrote:
Correction: the only metadata attribute we set is the attachment(s) mimetype (as a multivalued field) -- this doesn't currently include the attachment data.

Karl


On Tue, Feb 7, 2017 at 1:14 PM, Karl Wright <daddywri@gmail.com> wrote:
Hi Cihad,

The email connector is providing the attachment data unextracted to the output connector as metadata attribute data.  There are no transformation connectors that look at this metadata.  Solr cell also probably does not handle binary in random metadata attributes the proper way.

The connector's attachment code therefore seems to be designed only to deal with textual attachments.  The right solution is to have individual IDs for each attachment.  But that would also require there to be a URL we could construct for each attachment.  We could provide an additional URI template for attachments, but I'd wonder if your system has the ability to serve attachments by their own URLs.  Please let me know if this would work and if so I can create a ticket and work on making these changes.

Thanks,
Karl


On Tue, Feb 7, 2017 at 12:56 PM, Cihad Guzel <cguzelg@gmail.com> wrote:
Hi,

I try the email connector with gmail. I attach the file [1] in my new email. And sent to my test email adress. 

My mail content body is like: "this is test mail for mfc"

Then I run my email job and the email is indexed to Solr successfully. But, the solr's content field have not my attachment's content body. Solr content filed looks like:

"content":" \n \n  \n  \n  \n  \n  \n  \n  \n \n  --94eb2c1910841bc55f0547f43443\r\nContent-Type: multipart/alternative; boundary=94eb2c1910841bc5530547f43441\r\n\r\n--94eb2c1910841bc5530547f43441\r\nContent-Type: text/plain; charset=UTF-8\r\n\r\nthis is test mail for mfc.\r\n\r\n--94eb2c1910841bc5530547f43441\r\nContent-Type: text/html; charset=UTF-8\r\n\r\n<div dir=\"ltr\">this is test mail for mfc.\r\n</div>\r\n\r\n--94eb2c1910841bc5530547f43441--\r\n--94eb2c1910841bc55f0547f43443\r\nContent-Type: application/pdf; name=\"pdf-test.pdf\"\r\nContent-Disposition: attachment; filename=\"pdf-test.pdf\"\r\nContent-Transfer-Encoding: base64\r\nX-Attachment-Id: f_iyvt78qa0\r\n\r\nJVBERi0xLjYNJeLjz9MNCjM3IDAgb2JqIDw8L0xpbmVhcml6ZWQgMS9MIDIwNTk3L08gNDAvRSAx\r\nNDExNS9OIDEvVCAxOTc5NS9IIFsgMTAwNSAyMTVdPj4NZW5kb2JqDSAgICAgICAgICAgICAgICAg\r\nDQp4cmVmDQozNyAzNA0KMDAwMDAwMDAxNiAwMDAwMCBuDQowMDAwMDAxMzg2IDAwMDAwIG4NCjAw\r\nMDAwMDE1MjIgMDAwM ..."

Does the MFC email connector know that the attachment's file type is pdf? Does not extract the contents?

--
Regards
Cihad Güzel






--
Teşekkürler
Cihad Güzel



--
Teşekkürler
Cihad Güzel




--
Teşekkürler
Cihad Güzel