Hi Cihad,
The comparison should have been:

mp.getCount() <= attachmentNumber

As for changing ":" to "/", the real problem is that these should all be ":"'s, including line 678.  My apologies.  I've committed the changes.

Thanks,
Karl


On Thu, Feb 9, 2017 at 8:15 AM, Cihad Guzel <cguzelg@gmail.com> wrote:
Hi Karl,

mp.getCount() is 2 
and 
attachmentNumber is '0' or '1' in my case.  

Regards,
Cihad Guzel

2017-02-09 16:07 GMT+03:00 Cihad Guzel <cguzelg@gmail.com>:
Hi Karl,

I made some changes in the code and then the indexing was done successfully.

The changes are as follows:

I have removed these lines (lines: 772-775):

             if (mp.getCount() >= attachmentNumber) {
                activities.deleteDocument(documentIdentifier);
                continue;
              }

I updated these lines: (lines :1485 and 1586)
      int index2 = di.indexOf("/", index1 + 1); 
as like:
      int index2 = di.indexOf(":", index1 + 1);

Regards,
Cihad Guzel




2017-02-08 2:10 GMT+03:00 Karl Wright <daddywri@gmail.com>:
Hi Cihad,

You need to set an attachment URL template for the attachments to be crawled.  Open your email connection and click the "URL" tab, and you will see the new field there.

Karl


On Tue, Feb 7, 2017 at 6:07 PM, Cihad Guzel <cguzelg@gmail.com> wrote:
Hi Karl,

Does not 'else' part has to be proccessed when the email has an attachment?  
Although the email has an attachment, only the first part was processed. Also, I don't see the attachment's content in solr index.

I edited the code line for testing as follow:

 if (attachmentIndex == null) {
          // It's an email
          System.out.println("running if block");
...
        } else {
          System.out.println("running else block");
          // It's an attachment
          attachmentNumber = attachmentIndex;
...
        }

Then, I run my job. It processed 3 times. The log looks as like:

...
running if block
running if block
running if block
...


The solr response:

{
        "subject":["pdf test page"],
        "from":["Cihad Guzel <cguzelg@gmail.com>"],
        "date":["Tue Feb 07 20:37:35 MSK 2017"],
        "mimetype":["",
          ""],
        "created_date":"2017-02-07T17:37:35.000Z",
        "indexed_date":"2017-02-07T21:18:05.382Z",
        "to":["Cihad Guzel <cguzelg@gmail.com>"],
        "modified_date":"2017-02-07T17:37:35.000Z",
        "encoding":["",
          ""],
        "mime_type":"text/plain",
        "stream_size":["null"],
        "x_parsed_by":["org.apache.tika.parser.DefaultParser",
          "org.apache.tika.parser.txt.TXTParser"],
        "stream_content_type":["text/plain"],
        "content_encoding":["windows-1252"],
        "content_type":["text/plain; charset=windows-1252"],
        "content":" \n \n  \n  \n  \n  \n  \n  \n  \n \n  --94eb2c1910841bc55f0547f43443\r\nContent-Type: multipart/alternative; boundary=94eb2c1910841bc5530547f43441\r\n\r\n--94eb2c1910841bc5530547f43441\r\nContent-Type: text/plain; charset=UTF-8\r\n\r\nthis is test mail for mfc.\r\n\r\n--94eb2c1910841bc5530547f43441\r\nContent-Type: text/html; charset=UTF-8\r\n\r\n<div dir=\"ltr\">this is test mail for mfc.\r\n</div>\r\n\r\n--94eb2c1910841bc5530547f43441--\r\n--94eb2c1910841bc55f0547f43443\r\nContent-Type: application/pdf; name=\"pdf-test.pdf\"\r\nContent-Disposition: attachment; filename=\"pdf-test.pdf\"\r\nContent-Transfer-Encoding: base64\r\nX-Attachment-Id: f_iyvt78qa0\r\n\r\nJVBERi0xLjYNJeLjz9MNCjM3IDAgb2JqIDw8L0xpbmVhcml6ZWQgMS9... ",
        "language":"en",
        "_version_":1558710621053124608}]
  }



2017-02-08 1:17 GMT+03:00 Karl Wright <daddywri@gmail.com>:

On Tue, Feb 7, 2017 at 5:14 PM, Karl Wright <daddywri@gmail.com> wrote:
Hi Cihad,

The variable attachmentIndex is *supposed* to be null except when an attachment is being processed.  The code should look like this:

        if (attachmentIndex == null) {
          // It's an email
...
        } else {
          // It's an attachment
          attachmentNumber = attachmentIndex;
...
        }


Karl


On Tue, Feb 7, 2017 at 4:43 PM, Cihad Guzel <cguzelg@gmail.com> wrote:
Hi Karl,

I added LOG line for testing. It looks attachmentIndex is null.

2017-02-08 0:11 GMT+03:00 Karl Wright <daddywri@gmail.com>:
I attached a second patch (to apply on top of the first patch).  Please let me know if that fixes the issue.

Karl


On Tue, Feb 7, 2017 at 3:59 PM, Cihad Guzel <cguzelg@gmail.com> wrote:
Hi Karl,

I have an error as follow:

FATAL 2017-02-07 23:56:09,483 (Worker thread '29') - Error tossed: For input string: "myFolder/test:<CADNgPDgSXHeWo0GDnUL6S2sogUsXUa9mx2WxOT23Wi37Hog5Gw@mail.gmail.com>"
java.lang.NumberFormatException: For input string: "myFolder/test:<CADNgPDgSXHeWo0GDnUL6S2sogUsXUa9mx2WxOT23Wi37Hog5Gw@mail.gmail.com>"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Integer.parseInt(Integer.java:580)
        at java.lang.Integer.parseInt(Integer.java:615)
        at org.apache.manifoldcf.crawler.connectors.email.EmailConnector.processDocuments(EmailConnector.java:705)
        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)


2017-02-07 22:50 GMT+03:00 Cihad Guzel <cguzelg@gmail.com>:
Thanks Karl,

I will try it.

Regards
Cihad Guzel

2017-02-07 22:36 GMT+03:00 Karl Wright <daddywri@gmail.com>:
I've created a ticket and attached a patch to it.  CONNECTORS-1375.  Please let me know if it works for you; if not, I'll fix what doesn't work.

Karl


On Tue, Feb 7, 2017 at 1:19 PM, Karl Wright <daddywri@gmail.com> wrote:
Correction: the only metadata attribute we set is the attachment(s) mimetype (as a multivalued field) -- this doesn't currently include the attachment data.

Karl


On Tue, Feb 7, 2017 at 1:14 PM, Karl Wright <daddywri@gmail.com> wrote:
Hi Cihad,

The email connector is providing the attachment data unextracted to the output connector as metadata attribute data.  There are no transformation connectors that look at this metadata.  Solr cell also probably does not handle binary in random metadata attributes the proper way.

The connector's attachment code therefore seems to be designed only to deal with textual attachments.  The right solution is to have individual IDs for each attachment.  But that would also require there to be a URL we could construct for each attachment.  We could provide an additional URI template for attachments, but I'd wonder if your system has the ability to serve attachments by their own URLs.  Please let me know if this would work and if so I can create a ticket and work on making these changes.

Thanks,
Karl


On Tue, Feb 7, 2017 at 12:56 PM, Cihad Guzel <cguzelg@gmail.com> wrote:
Hi,

I try the email connector with gmail. I attach the file [1] in my new email. And sent to my test email adress. 

My mail content body is like: "this is test mail for mfc"

Then I run my email job and the email is indexed to Solr successfully. But, the solr's content field have not my attachment's content body. Solr content filed looks like:

"content":" \n \n  \n  \n  \n  \n  \n  \n  \n \n  --94eb2c1910841bc55f0547f43443\r\nContent-Type: multipart/alternative; boundary=94eb2c1910841bc5530547f43441\r\n\r\n--94eb2c1910841bc5530547f43441\r\nContent-Type: text/plain; charset=UTF-8\r\n\r\nthis is test mail for mfc.\r\n\r\n--94eb2c1910841bc5530547f43441\r\nContent-Type: text/html; charset=UTF-8\r\n\r\n<div dir=\"ltr\">this is test mail for mfc.\r\n</div>\r\n\r\n--94eb2c1910841bc5530547f43441--\r\n--94eb2c1910841bc55f0547f43443\r\nContent-Type: application/pdf; name=\"pdf-test.pdf\"\r\nContent-Disposition: attachment; filename=\"pdf-test.pdf\"\r\nContent-Transfer-Encoding: base64\r\nX-Attachment-Id: f_iyvt78qa0\r\n\r\nJVBERi0xLjYNJeLjz9MNCjM3IDAgb2JqIDw8L0xpbmVhcml6ZWQgMS9MIDIwNTk3L08gNDAvRSAx\r\nNDExNS9OIDEvVCAxOTc5NS9IIFsgMTAwNSAyMTVdPj4NZW5kb2JqDSAgICAgICAgICAgICAgICAg\r\nDQp4cmVmDQozNyAzNA0KMDAwMDAwMDAxNiAwMDAwMCBuDQowMDAwMDAxMzg2IDAwMDAwIG4NCjAw\r\nMDAwMDE1MjIgMDAwM ..."

Does the MFC email connector know that the attachment's file type is pdf? Does not extract the contents?

--
Regards
Cihad Güzel






--
Teşekkürler
Cihad Güzel



--
Teşekkürler
Cihad Güzel




--
Teşekkürler
Cihad Güzel





--
Teşekkürler
Cihad Güzel




--
Teşekkürler
Cihad Güzel



--
Teşekkürler
Cihad Güzel