Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 873FD200CEF for ; Mon, 4 Sep 2017 16:06:19 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 85B251650C6; Mon, 4 Sep 2017 14:06:19 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3BF061650C5 for ; Mon, 4 Sep 2017 16:06:17 +0200 (CEST) Received: (qmail 78307 invoked by uid 500); 4 Sep 2017 14:06:15 -0000 Mailing-List: contact user-help@manifoldcf.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@manifoldcf.apache.org Delivered-To: mailing list user@manifoldcf.apache.org Received: (qmail 78297 invoked by uid 99); 4 Sep 2017 14:06:15 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Sep 2017 14:06:15 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 3A84DD36B2 for ; Mon, 4 Sep 2017 14:06:15 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.879 X-Spam-Level: *** X-Spam-Status: No, score=3.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_REPLY=1, HTML_MESSAGE=2, KAM_NUMSUBJECT=0.5, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id e0HiFw8cEELn for ; Mon, 4 Sep 2017 14:06:07 +0000 (UTC) Received: from mail-it0-f43.google.com (mail-it0-f43.google.com [209.85.214.43]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 0F69B5FC99 for ; Mon, 4 Sep 2017 14:06:07 +0000 (UTC) Received: by mail-it0-f43.google.com with SMTP id j17so3148927iti.1 for ; Mon, 04 Sep 2017 07:06:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=Zq0QqtpGlMkHUwcI2l8/TKG9iLg6xiEJaULHAvl8iOU=; b=QLfxLce1w9XJORyVp2yd252CMutgjwmNnsi/KgKv7v/QBzozicAcAkeu9KgkiHeyDs mMqIgllgiViJeMMkBHY7FAl1xdZ82+l0n4QpjHy1JrVMUu2+Mo9WnmXOkm9A28QzmGyw QhFH+iXHi9NTYs3kVWraYr0IiNA+VLMPQw5OXT+W+AvLV7kh0rzazK/w+7kO81iu41jk /aFZe4FuBextZZrUh3Jm2kW3gHgSriC3oMZ3gwCotJZlPKBUIvzXnvMOlaqSIbmPtlET 5TKzHcmQUqQS8ymgMgwDrilcXxZ0kezVI9jQFMJMBvSqUjepaJW1y14b54n/7azaz2T1 yNgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=Zq0QqtpGlMkHUwcI2l8/TKG9iLg6xiEJaULHAvl8iOU=; b=YQEtYwU8rMjwHgo/CQ7B6xuHFXbJdBm0L5FrLA05qbViHY14f2HukSl3SvSQumXsjV /X0WP49Vs5cLyeh4O0B//bQk3jgndO16JgauDj80bjMSK7MpKzNUoaqWpyZKoaAHqm4g jDLsaf6fej11WPhbWxB24Qtf/Qrn52gNbHT7zGcMq942E2BcqrxilI3WqdeBZgEUng6U 1V8IhsF0Qq0QvMQ/oLSk3DYJXpA81ycmU0c7RX3fI6HqWqcQletMi4THfAz/pEsW6Iu+ AdT41lpylxyjRobe4Vv5tyVDiSbUizgQ+GnxIwvpLg/6GoT978Rb93RGBPN/fUNZ44w+ N6/Q== X-Gm-Message-State: AHPjjUgDQsbt9FN406AotGD0LnL/x0RSlog5YI6WmCrw5ZzzijjyokfU 4mCo15nBzup0J7+10nfQCUDv4cgJyw== X-Google-Smtp-Source: ADKCNb4r0g0bF3cNI+j3g3IDFmG85Yo1RpW6RLoe+mhDGeUGVpI1WF2awLV/JoUzEaoqVyIDpKY271x196ktjNcCxoI= X-Received: by 10.36.89.139 with SMTP id p133mr952092itb.112.1504533966050; Mon, 04 Sep 2017 07:06:06 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.56.138 with HTTP; Mon, 4 Sep 2017 07:06:04 -0700 (PDT) In-Reply-To: References: From: Karl Wright Date: Mon, 4 Sep 2017 10:06:04 -0400 Message-ID: Subject: Re: Question about ManifoldCF 2.8 To: "user@manifoldcf.apache.org" Content-Type: multipart/alternative; boundary="001a11451814995aa205585d9cfb" archived-at: Mon, 04 Sep 2017 14:06:19 -0000 --001a11451814995aa205585d9cfb Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Othman, I won't be able to look at this today; it is a holiday here. But, the "socket write" error is coming from ElasticSearch. If ES is configured to not accept documents greater than a certain size, that might explain it. Maybe the ES logs would help? I'm afraid you're going to need to do the work to find out what is going wrong in those cases now. Thanks, Karl On Mon, Sep 4, 2017 at 4:53 AM, Beelz Ryuzaki wrote: > Hi Karl, > > This morning, I have tried the zookeeper based file and it worked really > good. However, I still have one error which is bugging me. It is a socket > write error. You will find attached the simple history report. > Surprisingly, I didn't have any stack trace in the ManifoldCF log file. > > Best regards, > > Othman. > > On Fri, 1 Sep 2017 at 19:39, Karl Wright wrote: > >> This is from file locking yet again. >> >> I have uploaded a new RC. Please download and try out the zookeeper >> locking. >> >> https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.8.= 1 >> >> Karl >> >> >> On Fri, Sep 1, 2017 at 1:11 PM, Beelz Ryuzaki >> wrote: >> >>> There is another issue as well that gives the following stack trace. >>> >>> Othman. >>> >>> On Fri, 1 Sep 2017 at 18:05, Beelz Ryuzaki wrote: >>> >>>> Hi Karl, >>>> >>>> I took the binary from the ManifoldCF 2.8.1 RC0. It had the version 3.= 9 >>>> of POI and when I changed the version to 3.15 it worked fine. I really= want >>>> to try the zookeeper if as you told me its performance is better than = the >>>> file-based example. For the time being, I'm using the file-based becau= se it >>>> is the only part that works for me but I actually need a stable versio= n for >>>> my production environment. That is one point. >>>> Another point is, the path's tab is still an issue for me because I >>>> exclude some files and it still crawls them. I want to exclude some >>>> specific extensions of files and some specific directories. For instan= ce, i >>>> don't want to index .exe files and contains a specific word. I do as >>>> follows I make the first exclude with *.exe and the second one with *w= ord*. >>>> Only the second one which doesn't work. How can I solve this issue, pl= ease? >>>> >>>> Thank you very much, have a nice week-end, >>>> >>>> Othman >>>> On Fri, 1 Sep 2017 at 16:46, Karl Wright wrote: >>>> >>>>> Hi Othman, >>>>> >>>>> I will respin a new 2.8.1 (RC1) to address the zookeeper issue. >>>>> >>>>> The failure you are seeing is "NoSuchMethodError". Therefore, the >>>>> class is being found, but it is the *wrong* class. When you deployed= the >>>>> new release, did you deploy it in a new directory, or did you overwri= te the >>>>> previous deployment? If you overwrote it, you probably have multiple >>>>> versions of the POI jars. >>>>> >>>>> Karl >>>>> >>>>> >>>>> On Fri, Sep 1, 2017 at 9:59 AM, Beelz Ryuzaki >>>>> wrote: >>>>> >>>>>> Hi Karl, >>>>>> >>>>>> I have just tried the new release of ManifoldCF. At first, the first >>>>>> job ended normally, but in the second I got a new stack trace concer= ning >>>>>> the POI. Moreover, the runzookeeper.bat doesn't run properly. It sho= ws me >>>>>> the stack trace attached. >>>>>> >>>>>> Ps: >>>>>> The second attached file contains the POI stack trace. >>>>>> >>>>>> Othman. >>>>>> >>>>>> On Fri, 1 Sep 2017 at 12:21, Karl Wright wrote: >>>>>> >>>>>>> Hi Othman, >>>>>>> >>>>>>> You do not need a new database instance. >>>>>>> >>>>>>> You can download MCF 2.8.1 RC0 from here: >>>>>>> >>>>>>> https://dist.apache.org/repos/dist/dev/manifoldcf/apache- >>>>>>> manifoldcf-2.8.1 >>>>>>> >>>>>>> Karl >>>>>>> >>>>>>> >>>>>>> On Fri, Sep 1, 2017 at 5:42 AM, Beelz Ryuzaki >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Karl, >>>>>>>> >>>>>>>> Thank you very much for your help, I'm going to try out the >>>>>>>> zookeeper example. Should I initialize a new database? And how can= I run >>>>>>>> the zookeeper start-agent ? >>>>>>>> >>>>>>>> Othman. >>>>>>>> >>>>>>>> On Fri, 1 Sep 2017 at 11:37, Karl Wright >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Othman, >>>>>>>>> >>>>>>>>> These exceptions are now coming from file locking and are due to >>>>>>>>> permissions problems. I suggest you go to Zookeeper for file loc= king. >>>>>>>>> >>>>>>>>> I am building a 2.8.1 release candidate. When it available for >>>>>>>>> download, I'll send you the URL. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Karl >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Sep 1, 2017 at 5:27 AM, Beelz Ryuzaki >>>>>>>> > wrote: >>>>>>>>> >>>>>>>>>> Hi Karl, >>>>>>>>>> >>>>>>>>>> This morning, I have followed the steps you told me to do and I >>>>>>>>>> still got stack traces. I have attached the stack traces as well= as the >>>>>>>>>> content of my lib repo and option.env. >>>>>>>>>> I have installed zookeeper and I'm ready to use the zookeeper >>>>>>>>>> example. Could you guide through it? I don't know if I follow th= e same >>>>>>>>>> steps in the file based example, I may not get stack traces. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Othman >>>>>>>>>> >>>>>>>>>> On Thu, 31 Aug 2017 at 18:19, Karl Wright >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Please do the following: >>>>>>>>>>> >>>>>>>>>>> (0) Shut down all ManifoldCF processes. >>>>>>>>>>> (1) Move poi*.jar from connector-common-lib to lib. >>>>>>>>>>> (2) Move dom4j*.jar from connector-common-lib to lib. >>>>>>>>>>> (3) Move commons-collections4*.jar from connector-common-lib to >>>>>>>>>>> lib. >>>>>>>>>>> (4) Move xmlbeans*.java from connector-common-lib to lib. >>>>>>>>>>> (5) Move curvesapi*.jar from connector-common-lib to lib. >>>>>>>>>>> (6) Modify your options.env to include all of the jars you move= d. >>>>>>>>>>> (7) Start up all ManifoldCF processes. >>>>>>>>>>> (8) If you still get stack traces, please send them to me. >>>>>>>>>>> >>>>>>>>>>> Karl >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Aug 31, 2017 at 12:12 PM, Beelz Ryuzaki < >>>>>>>>>>> i93othman@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Karl, >>>>>>>>>>>> >>>>>>>>>>>> By 'other place', do you mean the \lib repository? If that so, >>>>>>>>>>>> then I have already tried it and it didn't work. >>>>>>>>>>>> >>>>>>>>>>>> Othman. >>>>>>>>>>>> >>>>>>>>>>>> On Thu, 31 Aug 2017 at 18:07, Karl Wright >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Othman, >>>>>>>>>>>>> >>>>>>>>>>>>> I used the java dependency inspector to see what the issue is >>>>>>>>>>>>> and it turns out that poi-ooxml.jar does refer back to poi.ja= r in the class >>>>>>>>>>>>> that is failing. So you will need to move poi-3.15.jar and >>>>>>>>>>>>> commons-collections4-1.4.jar to the other place as well. >>>>>>>>>>>>> >>>>>>>>>>>>> Let's hope that finally fixes this issue. >>>>>>>>>>>>> >>>>>>>>>>>>> I'm very unhappy about the quality of the POI project code; i= t >>>>>>>>>>>>> is definitely not using reasonable engineering practices, and= I will be >>>>>>>>>>>>> opening a ticket with them. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Karl >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:57 AM, Beelz Ryuzaki < >>>>>>>>>>>>> i93othman@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> I'm using the file based example and all the changes you tol= d >>>>>>>>>>>>>> me to do. I reproduced them in the file based example. I'll = try to install >>>>>>>>>>>>>> zookeeper and use the zookeeper example. Will I need a confi= guration to do >>>>>>>>>>>>>> in order to run the zookeeper example ? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:46, Karl Wright >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Are you using the zookeeper example, or the file-based >>>>>>>>>>>>>>> example? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> If these jars have all been moved, and the options.env >>>>>>>>>>>>>>> includes them, then I have to conclude that Apache POI's po= m.xml is >>>>>>>>>>>>>>> incorrect too. It will take a while to figure out what's m= issing that >>>>>>>>>>>>>>> poi-ooxml.jar needs that is not listed. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:39 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>> i93othman@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> All the dependencies you mentioned have already been added >>>>>>>>>>>>>>>> in the options.env.win file in the multiprocess-file-examp= le repository. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:33, Beelz Ryuzaki < >>>>>>>>>>>>>>>> i93othman@gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Yes, I added it in the options.env.win file. Should it be >>>>>>>>>>>>>>>>> the one in the multiprocess-zk-example document or >>>>>>>>>>>>>>>>> multiprocess-file-example ? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:30, Karl Wright < >>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> It's not related at all to elasticsearch. >>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:26 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Could it be a problem of elasticsearch's version ? I'm >>>>>>>>>>>>>>>>>>> actually using 2.1.0 which is pretty old for this new v= ersion of ManifoldCF? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:23, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I moved back both the jars you mentioned and a >>>>>>>>>>>>>>>>>>>> different is showing. You will find the stack trace at= tached. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:09, Karl Wright < >>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I've looked at the dependencies; you should not have >>>>>>>>>>>>>>>>>>>>> moved poi-3.15.jar. Please move that back, and >>>>>>>>>>>>>>>>>>>>> commons-collections4-4.1.jar too. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> You *will* need to move curvesapi-1.04.jar though. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:04 AM, Karl Wright < >>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> If you include poi.jar, then all dependencies of >>>>>>>>>>>>>>>>>>>>>> poi.jar must also be included. This would mean that= curvesapi-1.04.jar and >>>>>>>>>>>>>>>>>>>>>> commons-collections4-4.1.jar should also be included= . >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 10:23 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Hi Karl, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I added the two jars that you have mentioned and >>>>>>>>>>>>>>>>>>>>>>> another one : poi-3.15.jar . Unfortunately, there i= s another error showing. >>>>>>>>>>>>>>>>>>>>>>> This time, it concerns excel files. You will find a= ttached the stack trace. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:32, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Hi Othman, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Yes, this shows that the jar we moved calls back >>>>>>>>>>>>>>>>>>>>>>>> into another jar, which will also need to be moved= . *That* jar has yet >>>>>>>>>>>>>>>>>>>>>>>> another dependency too. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> The list of jars is thus extended to include: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> poi-ooxml-3.15.jar >>>>>>>>>>>>>>>>>>>>>>>> dom4j-1.6.1.jar >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:25 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> You will find attached the stack trace. My >>>>>>>>>>>>>>>>>>>>>>>>> apologies for the bad quality of the image, I'm d= oing my best to send you >>>>>>>>>>>>>>>>>>>>>>>>> the stack trace as I don't have the right to send= documents outside the >>>>>>>>>>>>>>>>>>>>>>>>> company. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Thank you for your time, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:16, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Once again, I need a stack trace to diagnose wha= t >>>>>>>>>>>>>>>>>>>>>>>>>> the problem is. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:14 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Oh, actually it didn't solve the problem. I >>>>>>>>>>>>>>>>>>>>>>>>>>> looked into the log file and saw the following = error: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Error tossed : org/apache/poi/POIXMLTypeLoader >>>>>>>>>>>>>>>>>>>>>>>>>>> java.lang.NoClassDefFoundError: org/apache/poi/ >>>>>>>>>>>>>>>>>>>>>>>>>>> POIXMLTypeLoader. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Maybe another jar is missing ? >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:01, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> I have tried what you told me to do, and you >>>>>>>>>>>>>>>>>>>>>>>>>>>> expected the crawling resumed. How about the r= egular expressions? How can I >>>>>>>>>>>>>>>>>>>>>>>>>>>> make complex regular expressions in the job's = paths tab ? >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you very much for your help. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:47, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ok, I will try it right away and let you know >>>>>>>>>>>>>>>>>>>>>>>>>>>>> if it works. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:15, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Oh, and you also may need to edit your >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> options.env files to include them in the cla= sspath for startup. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:53 AM, Karl Wright >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If you are amenable, there is another >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> workaround you could try. Specifically: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (1) Shut down all MCF processes. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (2) Move the following two files from >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector-common-lib to lib: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> xmlbeans-2.6.0.jar >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> poi-ooxml-schemas-3.15.jar >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (3) Restart everything and see if your craw= l >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> resumes. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please let me know what happens. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:33 AM, Karl Wrigh= t >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I created a ticket for this: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> CONNECTORS-1450. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> One simple workaround is to use the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> external Tika server transformer rather th= an the embedded Tika Extractor. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm still looking into why the jar is not = being found. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:08 AM, Beelz >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, I'm actually using the latest binary >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> version, and my job got stuck on that spe= cific file. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The job status is still Running. You can >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> see it in the attached file. For your inf= ormation, the job started >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> yesterday. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 13:04, Karl Wright= < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It looks like a dependency of Apache POI >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is missing. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think we will need a ticket to address >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this, if you are indeed using the binary= distribution. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 6:57 AM, Beelz >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm actually using the binary version. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For security reasons, I can't send any = files from my computer. I have >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> copied the stack trace and scanned it w= ith my cellphone. I hope it will be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> helpful. Meanwhile, I have read the doc= umentation about how to restrict the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> crawling and I don't think the '|' work= s in the specified. For instance, I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would like to restrict the crawling for= the documents that counts the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'sound' word . I proceed as follows: *(= SON)* . the document is with capital >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> letters and I noticed that it didn't ta= ke it into consideration. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 12:40, Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The way you restrict documents with th= e >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> windows share connector is by specifyi= ng information on the "Paths" tab in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs that crawl windows shares. There= is end-user documentation both >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> online and distributed with all binary= distributions that describe how to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> do this. Have you found it? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:25 AM, Beelz >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello Karl, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you for your response, I will >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> start using zookeeper and I will let = you know if it works. I have another >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> question to ask. Actually, I need to = make some filters while crawling. I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't want to crawl some files and so= me folders. Could you give me an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> example of how to use the regex. Does= the regex allow to use /i to ignore >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cases ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 19:53, Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Beelz, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> File-based sync is deprecated becaus= e >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> people often have problems with gett= ing file permissions right, and they do >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not understand how to shut processes= down cleanly, and zookeeper is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> resilient against that. I highly re= commend using zookeeper sync. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF is engineered to not put >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> files into memory so you do not need= huge amounts of memory. The default >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> values are more than enough for 35,0= 00 files, which is a pretty small job >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for ManifoldCF. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 11:58 AM, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beelz Ryuzaki >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm actually not using zookeeper. i >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> want to know how is zookeeper diffe= rent from file based sync? I also need a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> guidance on how to manage my pc's m= emory. How many Go should I allocate for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the start-agent of ManifoldCF? Is 4= Go enough in order to crawler 35K files ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 16:11, Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Your disk is not writable for some >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reason, and that's interfering wit= h ManifoldCF 2.8 locking. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would suggest two things: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (1) Use Zookeeper for sync instead >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of file-based sync. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (2) Have a look if you still get >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> failures after that. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 9:37 AM, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beelz Ryuzaki >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Mr Karl, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you Mr Karl for your quick >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> response. I have looked into the = ManifoldCF log file and extracted the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> following warnings : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Attempt to set file lock >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'D:\xxxx\apache_manifoldcf-2. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 8\multiprocess-file-example\.\.\s= ynch >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> area\569\352\lock-_POOLTARGET_OUT= PUTCONNECTORPOOL_ES >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Lowercase) Synapses.lock' failed= : Access is denied. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Couldn't write to lock file; >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> disk may be full. Shutting down p= rocess; locks may be left dangling. You >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> must cleanup before restarting. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ES (lowercase) synapses being the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> elasticsearch output connection. = Moreover, the job uses Tika to extract >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metadata and a file system as a r= epository connection. During the job, I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't extract the content of the = documents. I was wandering if the issue >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comes from elasticsearch ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 14:08, Kar= l >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright wrote= : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF aborts a job if >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there's an error that looks like= it might go away on retry, but does not. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It can be either on the reposito= ry side or on the output side. If you look >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> at the Simple History in the UI,= or at the manifoldcf.log file, you should >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be able to get a better sense of= what went wrong. Without further >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> information, I can't say any mor= e. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 5:33 AM, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm Othman Belhaj, a software >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> engineer from soci=C3=A9t=C3=A9= g=C3=A9n=C3=A9rale in France. I'm actually using your recent >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> version of manifoldCF 2.8 . I'm= working on an internal search engine. For >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this reason, I'm using manifold= cf in order to index documents on windows >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> shares. I encountered a serious= problem while crawling 35K documents. Most >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the time, when manifoldcf st= art crawling a big sized documents (19Mo for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> example), it ends the job with = the following error: repeated service >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interruptions - failure process= ing document : software caused connection >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> abort: socket write error. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Can you give me some tips on ho= w >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to solve this problem, please ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I use PostgreSQL 9.3.x and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> elasticsearch 2.1.0 . >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm looking forward for your >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> response. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman BELHAJ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >> --001a11451814995aa205585d9cfb Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Othman,

I won't be able to look = at this today; it is a holiday here.=C2=A0 But, the "socket write"= ; error is coming from ElasticSearch.=C2=A0 If ES is configured to not acce= pt documents greater than a certain size, that might explain it.=C2=A0 Mayb= e the ES logs would help?

I'm afraid you're going to need to= do the work to find out what is going wrong in those cases now.
=
Thanks,
Karl


On Mon, Sep 4, 2017 at 4:53= AM, Beelz Ryuzaki <i93othman@gmail.com> wrote:
Hi Karl,

This morning, I have tried the zookeeper ba= sed file and it worked really good. However, I still have one error which i= s bugging me. It is a socket write error. You will find attached the simple= history report. Surprisingly, I didn't have any stack trace in the Man= ifoldCF log file.=C2=A0

= Best regards,

Othman.

On Fri, 1 Sep 2017 at 19:39, Karl Wright <daddywri@gmail.com> wrote:
This is from file locking yet again.=

I have uploaded a new RC.=C2=A0 Please download and try= out the zookeeper locking.


Karl
=

O= n Fri, Sep 1, 2017 at 1:11 PM, Beelz Ryuzaki <i93othman@gmail.com> wro= te:
There is anoth= er issue as well that gives the following stack trace.

Othman.=C2=A0

On Fri, 1 Sep 2017 at 18:05, Beelz Ryuzaki <i93othman@gmail.com> wrote:=
Hi Karl,=C2= =A0

I took the binary fr= om the ManifoldCF 2.8.1 RC0. It had the version 3.9 of POI and when I chang= ed the version to 3.15 it worked fine. I really want to try the zookeeper i= f as you told me its performance is better than the file-based example. For= the time being, I'm using the file-based because it is the only part t= hat works for me but I actually need a stable version for my production env= ironment. That is one point.=C2=A0
Another point is,= the path's tab is still an issue for me because I exclude some files a= nd it still crawls them. I want to exclude some specific extensions of file= s and some specific directories. For instance, i don't want to index .e= xe files and contains a specific word. I do as follows I make the first exc= lude with *.exe and the second one with *word*. Only the second one which d= oesn't work. How can I solve this issue, please?

Thank you very much, have a nice week-end,

Othman=C2=A0
On Fri, 1 Sep 2017 at 16:46, Karl Wright = <daddywri@gmail.= com> wrote:
Hi Othman,<= div>
I will respin a new 2.8.1 (RC1) to address the zookeeper= issue.

The failure you are seeing is "NoSuch= MethodError".=C2=A0 Therefore, the class is being found, but it is the= *wrong* class.=C2=A0 When you deployed the new release, did you deploy it = in a new directory, or did you overwrite the previous deployment?=C2=A0 If = you overwrote it, you probably have multiple versions of the POI jars.

Karl


On Fri, Sep 1, 2017 at 9:59= AM, Beelz Ryuzaki <i93othman@gmail.com> wrote:
Hi Karl,=C2=A0
=
I have just tried the new release of ManifoldCF= . At first, the first job ended normally, but in the second I got a new sta= ck trace concerning the POI. Moreover, the runzookeeper.bat doesn't run= properly. It shows me the stack trace attached.
Ps:
The second attached file= contains the POI stack trace.=C2=A0

Othman.

On Fri, 1 Sep 2017 at = 12:21, Karl Wright <daddywri@gmail.com> wrote:
Hi Othman,

You do not need a new database instan= ce.

You can download MCF 2.8.1 RC0 from here:

https://dist.apache.org/repos/dist/dev/manifoldcf/apache= -manifoldcf-2.8.1

Karl


On Fri, Sep 1, 2017 at 5:42 AM, Beelz Ryuzaki <i93othman@gmail.com> wrote:
Hi Karl= ,

Thank you very much fo= r your help, I'm going to try out the zookeeper example. Should I initi= alize a new database? And how can I run the zookeeper start-agent ?=C2=A0

Othman.

On Fri, 1 Sep 2017 at 11:37= , Karl Wright <d= addywri@gmail.com> wrote:
Hi Othman,

These exceptions are now coming from file = locking and are due to permissions problems.=C2=A0 I suggest you go to Zook= eeper for file locking.

I am building a 2.8.1 rele= ase candidate.=C2=A0 When it available for download, I'll send you the = URL.

Thanks,
Karl


On Fri, Sep 1= , 2017 at 5:27 AM, Beelz Ryuzaki <i93othman@gmail.com> wrote:
Hi Karl,

This morning, I have followed the steps y= ou told me to do and I still got stack traces. I have attached the stack tr= aces as well as the content of my lib repo and option.env.
I have installed zookeeper and I'm ready to use the zookeeper ex= ample. Could you guide through it? I don't know if I follow the same st= eps in the file based example, I may not get stack traces.=C2=A0

Thanks,
Oth= man=C2=A0

On Thu, 31 Aug 2017 at 18:19, Karl Wright <daddywri@gmail.com> wrote:
Please do the following:
(0) Shut down all ManifoldCF processes.
(1) Move poi*= .jar from connector-common-lib to lib.
(2) Move dom4j*.jar from c= onnector-common-lib to lib.
(3) Move commons-collections4*.jar fr= om connector-common-lib to lib.
(4) Move xmlbeans*.java from conn= ector-common-lib to lib.
(5) Move curvesapi*.jar from connector-c= ommon-lib to lib.
(6) Modify your options.env to include all of t= he jars you moved.
(7) Start up all ManifoldCF processes.
(8) If you still get stack traces, please send them to me.

Karl


On Thu, Aug 31, 2017 at 12:12 PM, Beel= z Ryuzaki <i93othman@gmail.com> wrote:
Hi Karl,=C2=A0

<= div dir=3D"auto">By 'other place', do you mean the \lib repository?= If that so, then I have already tried it and it didn't work.

Othman.

On Thu, 31 Aug 2017 at 18= :07, Karl Wright <daddywri@gmail.com> wrote:
Hi Othman,

I used the java dependency inspector to= see what the issue is and it turns out that poi-ooxml.jar does refer back = to poi.jar in the class that is failing.=C2=A0 So you will need to move poi= -3.15.jar and commons-collections4-1.4.jar to the other place as well.
<= br>Let's hope that finally fixes this issue.

I'm very unhapp= y about the quality of the POI project code; it is definitely not using rea= sonable engineering practices, and I will be opening a ticket with them.

Thanks,
Karl


On Thu, Aug 31, 2017= at 11:57 AM, Beelz Ryuzaki <i93othman@gmail.com> wrote:
I'm using the file based ex= ample and all the changes you told me to do. I reproduced them in the file = based example. I'll try to install zookeeper and use the zookeeper exam= ple. Will I need a configuration to do in order to run the zookeeper exampl= e ?=C2=A0

Othman.
<= div>

On Thu, 31 Aug 2017 at 17:46, Karl Wri= ght <daddywri@gm= ail.com> wrote:
Are you= using the zookeeper example, or the file-based example?

If these jars have all been moved, and the options.env includes them, then= I have to conclude that Apache POI's pom.xml is incorrect too.=C2=A0 I= t will take a while to figure out what's missing that poi-ooxml.jar nee= ds that is not listed.

Karl
<= br>

On= Thu, Aug 31, 2017 at 11:39 AM, Beelz Ryuzaki <i93othman@gmail.com> wr= ote:
All the depen= dencies you mentioned have already been added in the options.env.win file i= n the multiprocess-file-example repository.=C2=A0

On Thu, 31 Aug = 2017 at 17:33, Beelz Ryuzaki <i93othman@gmail.com> wrote:
Yes, I added it in the options.env.wi= n file. Should it be the one in the multiprocess-zk-example document or mul= tiprocess-file-example ?=C2=A0

On Thu, 31 Aug 2017 at 17:30, Karl Wright <daddywri@gmail.com> wrote:
It's not related at all to elast= icsearch.
Karl


On Thu, Aug 31, 2017 at 11:26 AM, Be= elz Ryuzaki <i93othman@gmail.com> wrote:
Could it be a problem of elasticsearch's ve= rsion ? I'm actually using 2.1.0 which is pretty old for this new versi= on of ManifoldCF?

Othman= .

On Thu, 31 Aug 2017 at 17:23, Beelz Ryuzaki <i93othman@gmail.com> wrote:
I moved back both th= e jars you mentioned and a different is showing. You will find the stack tr= ace attached.=C2=A0

Than= ks,
Othman=C2=A0

On Thu, 31 Aug 2017 at 17:09, Karl Wright <<= a href=3D"mailto:daddywri@gmail.com" target=3D"_blank">daddywri@gmail.com> wrote:
I've looked = at the dependencies; you should not have moved poi-3.15.jar.=C2=A0 Please m= ove that back, and commons-collections4-4.1.jar too.

You= *will* need to move curvesapi-1.04.jar though.

Th= anks,
Karl

<= br>
On Thu, Aug 31, 2017 at 11:04 AM, Karl Wright= <daddywri= @gmail.com> wrote:
If = you include poi.jar, then all dependencies of poi.jar must also be included= .=C2=A0 This would mean that=C2=A0curvesapi-1.04.jar and commons-collection= s4-4.1.jar should also be included.

Karl

On Thu, Aug 31, 2017 at 10:23 AM, Beelz Ryuzaki <i93othman@gmail.com&g= t; wrote:
H= i Karl,=C2=A0

I added th= e two jars that you have mentioned and another one : poi-3.15.jar . Unfortu= nately, there is another error showing. This time, it concerns excel files.= You will find attached the stack trace.=C2=A0

<= /div>
Othman.

On Thu, 31= Aug 2017 at 15:32, Karl Wright <daddywri@gmail.com> wrote:
Hi Othman,

Yes, this shows that the jar we move= d calls back into another jar, which will also need to be moved. =C2=A0*Tha= t* jar has yet another dependency too.

The list of jars = is thus extended to include:

poi-ooxml-3.15.jar
dom4j-1.6.1.jar

Karl
=

O= n Thu, Aug 31, 2017 at 9:25 AM, Beelz Ryuzaki <i93othman@gmail.com> wr= ote:
You will find= attached the stack trace. My apologies for the bad quality of the image, I= 'm doing my best to send you the stack trace as I don't have the ri= ght to send documents outside the company.

Thank you for your time,

Othman=C2=A0

On Thu, 31 Aug 2017 at 15:16, Karl Wright <daddywri@gmail.com>= wrote:
Once again, I need a s= tack trace to diagnose what the problem is.

Thanks,
Karl


On Thu, Aug 31, 2017 at 9:14 AM, Beelz Ryuzaki &l= t;i93othman@gmail.= com> wrote:
Oh, actually it didn't solve the problem. I looked into the l= og file and saw the following error:

Error tossed : org/apache/poi/POIXMLTypeLoader
java.lang.NoClassDefFoundError: org/apache/poi/POIXM= LTypeLoader.

Maybe anoth= er jar is missing ?

Othm= an.=C2=A0

On Thu, 31 Aug 2017 at 15:01, Beelz Ryuzaki &l= t;i93othman@gmail.= com> wrote:
I have tried what you told me to do, and you expected the crawling r= esumed. How about the regular expressions? How can I make complex regular e= xpressions in the job's paths tab ?

Thank you very much for your help.
<= br>
Othman.=C2=A0
=

On Th= u, 31 Aug 2017 at 14:47, Beelz Ryuzaki <i93othman@gmail.com> wrote:
Ok, I will try it right awa= y and let you know if it works.=C2=A0

Othman.

O= n Thu, 31 Aug 2017 at 14:15, Karl Wright <daddywri@gmail.com> wrote:
Oh, and you also may need to edit your option= s.env files to include them in the classpath for startup.
Karl


=
On Thu, Aug 31, 2017 at 7:53 AM, Karl Wright <daddywri@gma= il.com> wrote:
If you = are amenable, there is another workaround you could try.=C2=A0 Specifically= :

(1) Shut down all MCF processes.
(2) Move the following two fil= es from connector-common-lib to lib:

xmlbeans-2.6.0.jar
p= oi-ooxml-schemas-3.15.jar

(3) Restart everythi= ng and see if your crawl resumes.

Please let me kn= ow what happens.

Karl

=


On= Thu, Aug 31, 2017 at 7:33 AM, Karl Wright <daddywri@gmail.com> wrote:<= br>
I created a ticket for this: CONNECT= ORS-1450.

One simple workaround is to use the external T= ika server transformer rather than the embedded Tika Extractor.=C2=A0 I'= ;m still looking into why the jar is not being found.

Karl


On Thu, Aug 31, 2017 at 7:08 AM, Beelz Ryuzaki <i93othman@gmail.com><= /span> wrote:
Yes,= I'm actually using the latest binary version, and my job got stuck on = that specific file.=C2=A0
The job status is still Ru= nning. You can see it in the attached file. For your information, the job s= tarted yesterday.=C2=A0

= Thanks,=C2=A0

Othman

On Thu, 31 Aug 2017 at 13:04, Karl Wright <daddywri@gmail.com> wrote:
It looks like a dependency of Apache = POI is missing.
I think we will need a ticket to address this, if you a= re indeed using the binary distribution.

Thanks!
Karl

On Thu, Aug 31, 2017 at 6:57 AM, Beelz Ryuzaki <= ;i93othman@gmail.c= om> wrote:
I'm actually using the binary version. For security reasons, I c= an't send any files from my computer. I have copied the stack trace and= scanned it with my cellphone. I hope it will be helpful. Meanwhile, I have= read the documentation about how to restrict the crawling and I don't = think the '|' works in the specified. For instance, I would like to= restrict the crawling for the documents that counts the 'sound' wo= rd . I proceed as follows: *(SON)* . the document is with capital letters a= nd I noticed that it didn't take it into consideration.=C2=A0

Thanks,=C2=A0
Othman


<= div>

On Thu,= 31 Aug 2017 at 12:40, Karl Wright <daddywri@gmail.com> wrote:
Hi Othman,

The way you restrict = documents with the windows share connector is by specifying information on = the "Paths" tab in jobs that crawl windows shares.=C2=A0 There is= end-user documentation both online and distributed with all binary distrib= utions that describe how to do this.=C2=A0 Have you found it?

Karl


On Thu, Aug 31, 2017 at 5:25 AM, Beelz= Ryuzaki <i93othman@gmail.com> wrote:
Hello Karl,=C2=A0

Thank you for your response, I will start using zookeepe= r and I will let you know if it works. I have another question to ask. Actu= ally, I need to make some filters while crawling. I don't want to crawl= some files and some folders. Could you give me an example of how to use th= e regex. Does the regex allow to use /i to ignore cases ?=C2=A0

Thanks,=C2=A0
Othman

On Wed, 30 Aug 2017 at 19:53, Karl Wright <daddywri@gmail.com> wro= te:
Hi Beelz,

File-based sync is deprecated because people often have problems with ge= tting file permissions right, and they do not understand how to shut proces= ses down cleanly, and zookeeper is resilient against that.=C2=A0 I highly r= ecommend using zookeeper sync.

ManifoldCF is engineered to not put f= iles into memory so you do not need huge amounts of memory.=C2=A0 The defau= lt values are more than enough for 35,000 files, which is a pretty small jo= b for ManifoldCF.

Thanks,
Karl


= On Wed, Aug 30, 2017 at 11:58 AM, Beelz Ryuzaki <i93othman@gmail.com> = wrote:
I'm act= ually not using zookeeper. i want to know how is zookeeper different from f= ile based sync? I also need a guidance on how to manage my pc's memory.= How many Go should I allocate for the start-agent of ManifoldCF? Is 4Go en= ough in order to crawler 35K files ?

Othman.=C2=A0

On Wed, 30 Aug 2017 at 16:11, Karl Wright <daddywri@gmail.com> wrote:
Your disk is not writable for some reason= , and that's interfering with ManifoldCF 2.8 locking.

I would suggest two things:

(1) Use Zookeeper fo= r sync instead of file-based sync.
(2) Have a look if you still g= et failures after that.

Thanks,
Karl


On Wed, Aug 30, 2017 at 9:37 AM, Beelz Ryuzaki <i93othman@gmail.com> wrote:
Hi Mr = Karl,=C2=A0

Thank you Mr= Karl for your quick response. I have looked into the ManifoldCF log file a= nd extracted the following warnings :

- Attempt to set file lock 'D:\xxxx\apache_manifoldcf-2.<= wbr>8\multiprocess-file-example\.\.\synch area\569\352\lock-_POOLTARGE= T_OUTPUTCONNECTORPOOL_ES (Lowercase) Synapses.lock' failed : Acces= s is denied.


<= div dir=3D"auto">- Couldn't write to lock file; disk may be full. Shutt= ing down process; locks may be left dangling. You must cleanup before resta= rting.

ES (lowercase) sy= napses being the elasticsearch output connection. Moreover, the job uses Ti= ka to extract metadata and a file system as a repository connection. During= the job, I don't extract the content of the documents. I was wandering= if the issue comes from elasticsearch ?

<= div dir=3D"auto">Othman.=C2=A0



<= div class=3D"gmail_quote">
On Wed, 30 Aug 2017 at 14:08, Karl Wright &l= t;daddywri@gmail.co= m> wrote:
Hi Othman,
ManifoldCF aborts a job if there's an error that looks= like it might go away on retry, but does not.=C2=A0 It can be either on th= e repository side or on the output side.=C2=A0 If you look at the Simple Hi= story in the UI, or at the manifoldcf.log file, you should be able to get a= better sense of what went wrong.=C2=A0 Without further information, I can&= #39;t say any more.

Thanks,
Karl


On Wed, Aug 30, 2017 at 5:33 AM, Beelz Ryuzaki <i93othman@gmail.com>= wrote:
Hello,

I'm Othman Belhaj, a software engineer from soci=C3=A9t=C3=A9 g=C3= =A9n=C3=A9rale in France. I'm actually using your recent version of man= ifoldCF 2.8 . I'm working on an internal search engine. For this reason= , I'm using manifoldcf in order to index documents on windows shares. I= encountered a serious problem while crawling 35K documents. Most of the ti= me, when manifoldcf start crawling a big sized documents (19Mo for example)= , it ends the job with the following error: repeated service interruptions = - failure processing document : software caused connection abort: socket wr= ite error.=C2=A0
Can you give me some tips on how to solve this problem, please = ?=C2=A0

I use PostgreSQL 9.3.x and elasticsearch 2.1.0 .
I'm looking forward for = your response.

Best regards,=C2=A0

Othman BELHAJ








=













--001a11451814995aa205585d9cfb--