Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E3B1B200CFA for ; Tue, 5 Sep 2017 12:43:06 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id E22FA162AE8; Tue, 5 Sep 2017 10:43:06 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 68ED21629E1 for ; Tue, 5 Sep 2017 12:43:04 +0200 (CEST) Received: (qmail 43654 invoked by uid 500); 5 Sep 2017 10:43:03 -0000 Mailing-List: contact user-help@manifoldcf.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@manifoldcf.apache.org Delivered-To: mailing list user@manifoldcf.apache.org Received: (qmail 43643 invoked by uid 99); 5 Sep 2017 10:43:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Sep 2017 10:43:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id E47FDC0697 for ; Tue, 5 Sep 2017 10:43:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.099 X-Spam-Level: * X-Spam-Status: No, score=1.099 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_REPLY=1, HTML_MESSAGE=2, KAM_NUMSUBJECT=0.5, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-2.8, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id jJPok7fntE1j for ; Tue, 5 Sep 2017 10:42:54 +0000 (UTC) Received: from mail-io0-f179.google.com (mail-io0-f179.google.com [209.85.223.179]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 63BBF61312 for ; Tue, 5 Sep 2017 10:42:25 +0000 (UTC) Received: by mail-io0-f179.google.com with SMTP id i200so13989861ioa.2 for ; Tue, 05 Sep 2017 03:42:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=iVyryhFNjTepqWjHECr09Be3ZQcruee0cbVeRfqI7iA=; b=JjCpgrxy6RNqdf9YJ6iUJUdGUk8/va2OJ6iogR0ak1sHmXY2o/FmOqkAcjaftk6f0B u/g+GVM60P/byjdj7TUm9cz8KQcEFDCDQo7rdVG+oYMaIkv3qqLD6fskNvrxqrdIcVrw dY6wBrX4fwT5yp8Rgba6Mvs7lkQEeqjIzr3mviC2Vb/R3cR+9jpSdB+PBEoI0VCBMFs1 u8dL0l/u1bAtf4JlBdVK7svv97ZkoqPG4xD52ZVBWyGGMtsaV5ndhyUOxMRY6bCFB6NX xZCJE0eJkc7ZGS1HDvIpmhn4U5b8YNuDLW+PMmW1rdjzbP4vRxMiPxZYcs2gEoYKZkDf oVMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=iVyryhFNjTepqWjHECr09Be3ZQcruee0cbVeRfqI7iA=; b=gXkSc555TJpZqCdAA3X0SG+6nLKwfn6W4/C2SSLH/n9ulGlyo4Mr7dIB+mtmHr09wQ MXgAjZW0gTL+tjc8uhd/xwbuX5PluiBrM2ZNNgK9GmQwMlo4oZCEtjOvmkAEMEN3K1if 3zAcaJE6CTjWBWDcnNDLOhKjDFZzO4N+vl8C5gZE12b7sulmxOPAwwGKAuvDTVVUxNbe W7XRJofO3iBhmQSkN5rmnbMQGiMtdq/22lavLbLG6VjHUJifqXds4D7h50DDkjAsfZTJ DYSa6IIQcB6Q8o1S+DNry0dBdG+G3wEQSZ2IN7EhGo65RnSXhnyEMrpETgG2UFphyKGg k67A== X-Gm-Message-State: AHPjjUh9igJ952ZufQsKAQ8jqhi80kMRN3aaHTpMNbAQ1VM6z+XMyftM fu6aoOoFwztJUU41xoHuAF3TjpwbLg== X-Google-Smtp-Source: ADKCNb79Y1SkSH6btDGOupPspPvt22wmv5eT6FW8XS8eOCdMP+ve81/5CRg/ppr8EWjzu2gE1YOWi1CdPZoW2eTq3eo= X-Received: by 10.36.36.67 with SMTP id f64mr4064331ita.10.1504608138591; Tue, 05 Sep 2017 03:42:18 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.56.138 with HTTP; Tue, 5 Sep 2017 03:42:17 -0700 (PDT) In-Reply-To: References: From: Karl Wright Date: Tue, 5 Sep 2017 06:42:17 -0400 Message-ID: Subject: Re: Question about ManifoldCF 2.8 To: "user@manifoldcf.apache.org" Content-Type: multipart/alternative; boundary="001a1146ea90a0833805586ee1ba" archived-at: Tue, 05 Sep 2017 10:43:07 -0000 --001a1146ea90a0833805586ee1ba Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Othman, Thanks for doing the evaluation of the problem. Generally, the ManifoldCF project does not have the expertise to diagnose problems with external systems like Solr or Elasticsearch. So going to another newsgroup for those kinds of issues would be a good idea. Thanks! Karl On Tue, Sep 5, 2017 at 4:33 AM, Beelz Ryuzaki wrote: > Hi Karl, > > I have analyzed the error and found out that it was mainly an > elasticsearch problem. I saw in some forums that one of the adopted > solution is to modify elasticsearch.yml and set the http.max_content_leng= th > to a greater value. However, the job got stuck in the last two indexable > files ( two pptx files with 22Mo and 2Mo respectively). The job eventuall= y > ended but a stack trace showed that elasticsearch ran out of memory. For > your information, I have allocated 4Go for elasticsearch execution. Is it > enough in order to have a good performance. You will find attached the > stack traces of elasticsearch. > > Best regards, > > Othman BELHAJ. > > On Mon, 4 Sep 2017 at 16:40, Beelz Ryuzaki wrote: > >> Hi Karl, >> >> I'm sorry to bother on your holiday. I will try to analyze it today and >> let it you know what I have found. Enjoy your day ! >> >> Best regards, >> >> Othman BELHAJ. >> >> On Mon, 4 Sep 2017 at 16:06, Karl Wright wrote: >> >>> Hi Othman, >>> >>> I won't be able to look at this today; it is a holiday here. But, the >>> "socket write" error is coming from ElasticSearch. If ES is configured= to >>> not accept documents greater than a certain size, that might explain it= . >>> Maybe the ES logs would help? >>> >>> I'm afraid you're going to need to do the work to find out what is goin= g >>> wrong in those cases now. >>> >>> Thanks, >>> Karl >>> >>> >>> On Mon, Sep 4, 2017 at 4:53 AM, Beelz Ryuzaki >>> wrote: >>> >>>> Hi Karl, >>>> >>>> This morning, I have tried the zookeeper based file and it worked >>>> really good. However, I still have one error which is bugging me. It i= s a >>>> socket write error. You will find attached the simple history report. >>>> Surprisingly, I didn't have any stack trace in the ManifoldCF log file= . >>>> >>>> Best regards, >>>> >>>> Othman. >>>> >>>> On Fri, 1 Sep 2017 at 19:39, Karl Wright wrote: >>>> >>>>> This is from file locking yet again. >>>>> >>>>> I have uploaded a new RC. Please download and try out the zookeeper >>>>> locking. >>>>> >>>>> https://dist.apache.org/repos/dist/dev/manifoldcf/apache- >>>>> manifoldcf-2.8.1 >>>>> >>>>> Karl >>>>> >>>>> >>>>> On Fri, Sep 1, 2017 at 1:11 PM, Beelz Ryuzaki >>>>> wrote: >>>>> >>>>>> There is another issue as well that gives the following stack trace. >>>>>> >>>>>> Othman. >>>>>> >>>>>> On Fri, 1 Sep 2017 at 18:05, Beelz Ryuzaki >>>>>> wrote: >>>>>> >>>>>>> Hi Karl, >>>>>>> >>>>>>> I took the binary from the ManifoldCF 2.8.1 RC0. It had the version >>>>>>> 3.9 of POI and when I changed the version to 3.15 it worked fine. I= really >>>>>>> want to try the zookeeper if as you told me its performance is bett= er than >>>>>>> the file-based example. For the time being, I'm using the file-base= d >>>>>>> because it is the only part that works for me but I actually need a= stable >>>>>>> version for my production environment. That is one point. >>>>>>> Another point is, the path's tab is still an issue for me because I >>>>>>> exclude some files and it still crawls them. I want to exclude some >>>>>>> specific extensions of files and some specific directories. For ins= tance, i >>>>>>> don't want to index .exe files and contains a specific word. I do a= s >>>>>>> follows I make the first exclude with *.exe and the second one with= *word*. >>>>>>> Only the second one which doesn't work. How can I solve this issue,= please? >>>>>>> >>>>>>> Thank you very much, have a nice week-end, >>>>>>> >>>>>>> Othman >>>>>>> On Fri, 1 Sep 2017 at 16:46, Karl Wright wrote= : >>>>>>> >>>>>>>> Hi Othman, >>>>>>>> >>>>>>>> I will respin a new 2.8.1 (RC1) to address the zookeeper issue. >>>>>>>> >>>>>>>> The failure you are seeing is "NoSuchMethodError". Therefore, the >>>>>>>> class is being found, but it is the *wrong* class. When you deplo= yed the >>>>>>>> new release, did you deploy it in a new directory, or did you over= write the >>>>>>>> previous deployment? If you overwrote it, you probably have multi= ple >>>>>>>> versions of the POI jars. >>>>>>>> >>>>>>>> Karl >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Sep 1, 2017 at 9:59 AM, Beelz Ryuzaki >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Karl, >>>>>>>>> >>>>>>>>> I have just tried the new release of ManifoldCF. At first, the >>>>>>>>> first job ended normally, but in the second I got a new stack tra= ce >>>>>>>>> concerning the POI. Moreover, the runzookeeper.bat doesn't run pr= operly. It >>>>>>>>> shows me the stack trace attached. >>>>>>>>> >>>>>>>>> Ps: >>>>>>>>> The second attached file contains the POI stack trace. >>>>>>>>> >>>>>>>>> Othman. >>>>>>>>> >>>>>>>>> On Fri, 1 Sep 2017 at 12:21, Karl Wright >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Othman, >>>>>>>>>> >>>>>>>>>> You do not need a new database instance. >>>>>>>>>> >>>>>>>>>> You can download MCF 2.8.1 RC0 from here: >>>>>>>>>> >>>>>>>>>> https://dist.apache.org/repos/dist/dev/manifoldcf/apache- >>>>>>>>>> manifoldcf-2.8.1 >>>>>>>>>> >>>>>>>>>> Karl >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Sep 1, 2017 at 5:42 AM, Beelz Ryuzaki < >>>>>>>>>> i93othman@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Karl, >>>>>>>>>>> >>>>>>>>>>> Thank you very much for your help, I'm going to try out the >>>>>>>>>>> zookeeper example. Should I initialize a new database? And how = can I run >>>>>>>>>>> the zookeeper start-agent ? >>>>>>>>>>> >>>>>>>>>>> Othman. >>>>>>>>>>> >>>>>>>>>>> On Fri, 1 Sep 2017 at 11:37, Karl Wright >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Othman, >>>>>>>>>>>> >>>>>>>>>>>> These exceptions are now coming from file locking and are due >>>>>>>>>>>> to permissions problems. I suggest you go to Zookeeper for fi= le locking. >>>>>>>>>>>> >>>>>>>>>>>> I am building a 2.8.1 release candidate. When it available fo= r >>>>>>>>>>>> download, I'll send you the URL. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Karl >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Sep 1, 2017 at 5:27 AM, Beelz Ryuzaki < >>>>>>>>>>>> i93othman@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Karl, >>>>>>>>>>>>> >>>>>>>>>>>>> This morning, I have followed the steps you told me to do and >>>>>>>>>>>>> I still got stack traces. I have attached the stack traces as= well as the >>>>>>>>>>>>> content of my lib repo and option.env. >>>>>>>>>>>>> I have installed zookeeper and I'm ready to use the zookeeper >>>>>>>>>>>>> example. Could you guide through it? I don't know if I follow= the same >>>>>>>>>>>>> steps in the file based example, I may not get stack traces. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Othman >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, 31 Aug 2017 at 18:19, Karl Wright >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Please do the following: >>>>>>>>>>>>>> >>>>>>>>>>>>>> (0) Shut down all ManifoldCF processes. >>>>>>>>>>>>>> (1) Move poi*.jar from connector-common-lib to lib. >>>>>>>>>>>>>> (2) Move dom4j*.jar from connector-common-lib to lib. >>>>>>>>>>>>>> (3) Move commons-collections4*.jar from connector-common-lib >>>>>>>>>>>>>> to lib. >>>>>>>>>>>>>> (4) Move xmlbeans*.java from connector-common-lib to lib. >>>>>>>>>>>>>> (5) Move curvesapi*.jar from connector-common-lib to lib. >>>>>>>>>>>>>> (6) Modify your options.env to include all of the jars you >>>>>>>>>>>>>> moved. >>>>>>>>>>>>>> (7) Start up all ManifoldCF processes. >>>>>>>>>>>>>> (8) If you still get stack traces, please send them to me. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Karl >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 12:12 PM, Beelz Ryuzaki < >>>>>>>>>>>>>> i93othman@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Karl, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> By 'other place', do you mean the \lib repository? If that >>>>>>>>>>>>>>> so, then I have already tried it and it didn't work. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 18:07, Karl Wright < >>>>>>>>>>>>>>> daddywri@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Othman, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I used the java dependency inspector to see what the issue >>>>>>>>>>>>>>>> is and it turns out that poi-ooxml.jar does refer back to = poi.jar in the >>>>>>>>>>>>>>>> class that is failing. So you will need to move poi-3.15.= jar and >>>>>>>>>>>>>>>> commons-collections4-1.4.jar to the other place as well. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Let's hope that finally fixes this issue. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I'm very unhappy about the quality of the POI project code= ; >>>>>>>>>>>>>>>> it is definitely not using reasonable engineering practice= s, and I will be >>>>>>>>>>>>>>>> opening a ticket with them. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:57 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>> i93othman@gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I'm using the file based example and all the changes you >>>>>>>>>>>>>>>>> told me to do. I reproduced them in the file based exampl= e. I'll try to >>>>>>>>>>>>>>>>> install zookeeper and use the zookeeper example. Will I n= eed a >>>>>>>>>>>>>>>>> configuration to do in order to run the zookeeper example= ? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:46, Karl Wright < >>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Are you using the zookeeper example, or the file-based >>>>>>>>>>>>>>>>>> example? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> If these jars have all been moved, and the options.env >>>>>>>>>>>>>>>>>> includes them, then I have to conclude that Apache POI's= pom.xml is >>>>>>>>>>>>>>>>>> incorrect too. It will take a while to figure out what'= s missing that >>>>>>>>>>>>>>>>>> poi-ooxml.jar needs that is not listed. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:39 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> All the dependencies you mentioned have already been >>>>>>>>>>>>>>>>>>> added in the options.env.win file in the multiprocess-f= ile-example >>>>>>>>>>>>>>>>>>> repository. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:33, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Yes, I added it in the options.env.win file. Should it >>>>>>>>>>>>>>>>>>>> be the one in the multiprocess-zk-example document or >>>>>>>>>>>>>>>>>>>> multiprocess-file-example ? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:30, Karl Wright < >>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> It's not related at all to elasticsearch. >>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:26 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Could it be a problem of elasticsearch's version ? >>>>>>>>>>>>>>>>>>>>>> I'm actually using 2.1.0 which is pretty old for thi= s new version of >>>>>>>>>>>>>>>>>>>>>> ManifoldCF? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:23, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I moved back both the jars you mentioned and a >>>>>>>>>>>>>>>>>>>>>>> different is showing. You will find the stack trace= attached. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:09, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I've looked at the dependencies; you should not >>>>>>>>>>>>>>>>>>>>>>>> have moved poi-3.15.jar. Please move that back, a= nd >>>>>>>>>>>>>>>>>>>>>>>> commons-collections4-4.1.jar too. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> You *will* need to move curvesapi-1.04.jar though. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:04 AM, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> If you include poi.jar, then all dependencies of >>>>>>>>>>>>>>>>>>>>>>>>> poi.jar must also be included. This would mean t= hat curvesapi-1.04.jar and >>>>>>>>>>>>>>>>>>>>>>>>> commons-collections4-4.1.jar should also be inclu= ded. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 10:23 AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Hi Karl, >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> I added the two jars that you have mentioned and >>>>>>>>>>>>>>>>>>>>>>>>>> another one : poi-3.15.jar . Unfortunately, ther= e is another error showing. >>>>>>>>>>>>>>>>>>>>>>>>>> This time, it concerns excel files. You will fin= d attached the stack trace. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:32, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman, >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, this shows that the jar we moved calls bac= k >>>>>>>>>>>>>>>>>>>>>>>>>>> into another jar, which will also need to be mo= ved. *That* jar has yet >>>>>>>>>>>>>>>>>>>>>>>>>>> another dependency too. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> The list of jars is thus extended to include: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> poi-ooxml-3.15.jar >>>>>>>>>>>>>>>>>>>>>>>>>>> dom4j-1.6.1.jar >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:25 AM, Beelz Ryuzaki = < >>>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> You will find attached the stack trace. My >>>>>>>>>>>>>>>>>>>>>>>>>>>> apologies for the bad quality of the image, I'= m doing my best to send you >>>>>>>>>>>>>>>>>>>>>>>>>>>> the stack trace as I don't have the right to s= end documents outside the >>>>>>>>>>>>>>>>>>>>>>>>>>>> company. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you for your time, >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:16, Karl Wright < >>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Once again, I need a stack trace to diagnose >>>>>>>>>>>>>>>>>>>>>>>>>>>>> what the problem is. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:14 AM, Beelz Ryuzak= i >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Oh, actually it didn't solve the problem. I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> looked into the log file and saw the followi= ng error: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Error tossed : org/apache/poi/ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> POIXMLTypeLoader >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> java.lang.NoClassDefFoundError: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> org/apache/poi/POIXMLTypeLoader. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Maybe another jar is missing ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:01, Beelz Ryuzaki = < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have tried what you told me to do, and yo= u >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> expected the crawling resumed. How about th= e regular expressions? How can I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> make complex regular expressions in the job= 's paths tab ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you very much for your help. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:47, Beelz Ryuzaki= < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ok, I will try it right away and let you >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> know if it works. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:15, Karl Wright = < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Oh, and you also may need to edit your >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> options.env files to include them in the = classpath for startup. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:53 AM, Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If you are amenable, there is another >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> workaround you could try. Specifically: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (1) Shut down all MCF processes. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (2) Move the following two files from >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector-common-lib to lib: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> xmlbeans-2.6.0.jar >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> poi-ooxml-schemas-3.15.jar >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (3) Restart everything and see if your >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> crawl resumes. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please let me know what happens. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:33 AM, Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I created a ticket for this: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> CONNECTORS-1450. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> One simple workaround is to use the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> external Tika server transformer rather= than the embedded Tika Extractor. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm still looking into why the jar is n= ot being found. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:08 AM, Beelz >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, I'm actually using the latest >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> binary version, and my job got stuck o= n that specific file. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The job status is still Running. You >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can see it in the attached file. For y= our information, the job started >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> yesterday. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 13:04, Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It looks like a dependency of Apache >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> POI is missing. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think we will need a ticket to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> address this, if you are indeed using= the binary distribution. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 6:57 AM, Beel= z >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm actually using the binary >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> version. For security reasons, I can= 't send any files from my computer. I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> have copied the stack trace and scan= ned it with my cellphone. I hope it >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will be helpful. Meanwhile, I have r= ead the documentation about how to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> restrict the crawling and I don't th= ink the '|' works in the specified. For >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> instance, I would like to restrict t= he crawling for the documents that >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> counts the 'sound' word . I proceed = as follows: *(SON)* . the document is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with capital letters and I noticed t= hat it didn't take it into >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consideration. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 12:40, Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The way you restrict documents with >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the windows share connector is by s= pecifying information on the "Paths" tab >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in jobs that crawl windows shares. = There is end-user documentation both >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> online and distributed with all bin= ary distributions that describe how to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> do this. Have you found it? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:25 AM, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beelz Ryuzaki >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello Karl, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you for your response, I wil= l >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> start using zookeeper and I will l= et you know if it works. I have another >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> question to ask. Actually, I need = to make some filters while crawling. I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't want to crawl some files and= some folders. Could you give me an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> example of how to use the regex. D= oes the regex allow to use /i to ignore >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cases ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 19:53, Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Beelz, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> File-based sync is deprecated >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> because people often have problem= s with getting file permissions right, and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> they do not understand how to shu= t processes down cleanly, and zookeeper is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> resilient against that. I highly= recommend using zookeeper sync. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF is engineered to not >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> put files into memory so you do n= ot need huge amounts of memory. The >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> default values are more than enou= gh for 35,000 files, which is a pretty >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> small job for ManifoldCF. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 11:58 AM, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beelz Ryuzaki >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm actually not using zookeeper= . >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i want to know how is zookeeper = different from file based sync? I also need >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a guidance on how to manage my p= c's memory. How many Go should I allocate >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for the start-agent of ManifoldC= F? Is 4Go enough in order to crawler 35K >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> files ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 16:11, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl Wright >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Your disk is not writable for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> some reason, and that's interfe= ring with ManifoldCF 2.8 locking. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would suggest two things: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (1) Use Zookeeper for sync >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> instead of file-based sync. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (2) Have a look if you still ge= t >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> failures after that. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 9:37 AM= , >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Mr Karl, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you Mr Karl for your >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> quick response. I have looked = into the ManifoldCF log file and extracted >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the following warnings : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Attempt to set file lock >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'D:\xxxx\apache_manifoldcf-2. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 8\multiprocess-file-example\.\= .\synch >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> area\569\352\lock-_POOLTARGET_= OUTPUTCONNECTORPOOL_ES >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Lowercase) Synapses.lock' fai= led : Access is denied. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Couldn't write to lock file; >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> disk may be full. Shutting dow= n process; locks may be left dangling. You >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> must cleanup before restarting= . >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ES (lowercase) synapses being >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the elasticsearch output conne= ction. Moreover, the job uses Tika to extract >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metadata and a file system as = a repository connection. During the job, I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't extract the content of t= he documents. I was wandering if the issue >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comes from elasticsearch ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017 at 14:08, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl Wright >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF aborts a job if >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there's an error that looks l= ike it might go away on retry, but does not. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It can be either on the repos= itory side or on the output side. If you look >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> at the Simple History in the = UI, or at the manifoldcf.log file, you should >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be able to get a better sense= of what went wrong. Without further >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> information, I can't say any = more. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017 at 5:33 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AM, Beelz Ryuzaki < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm Othman Belhaj, a softwar= e >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> engineer from soci=C3=A9t=C3= =A9 g=C3=A9n=C3=A9rale in France. I'm actually using your recent >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> version of manifoldCF 2.8 . = I'm working on an internal search engine. For >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this reason, I'm using manif= oldcf in order to index documents on windows >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> shares. I encountered a seri= ous problem while crawling 35K documents. Most >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the time, when manifoldcf= start crawling a big sized documents (19Mo for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> example), it ends the job wi= th the following error: repeated service >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interruptions - failure proc= essing document : software caused connection >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> abort: socket write error. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Can you give me some tips on >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> how to solve this problem, p= lease ? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I use PostgreSQL 9.3.x and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> elasticsearch 2.1.0 . >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm looking forward for your >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> response. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman BELHAJ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>> >>> --001a1146ea90a0833805586ee1ba Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Othman,

Thanks for doing the evaluat= ion of the problem.

Generally, the ManifoldCF proj= ect does not have the expertise to diagnose problems with external systems = like Solr or Elasticsearch.=C2=A0 So going to another newsgroup for those k= inds of issues would be a good idea.

Thanks!
=
Karl


On Tue, Sep 5, 2017 at 4:33 AM, Beelz Ryuzaki <i93ot= hman@gmail.com> wrote:
Hi Karl,=C2=A0

I have analyzed the error and found out that it was mainly an ela= sticsearch problem. I saw in some forums that one of the adopted solution i= s to modify elasticsearch.yml and set the http.max_content_length to a grea= ter value. However, the job got stuck in the last two indexable files ( two= pptx files with 22Mo and 2Mo respectively). The job eventually ended but a= stack trace showed that elasticsearch ran out of memory. For your informat= ion, I have allocated 4Go for elasticsearch execution. Is it enough in orde= r to have a good performance. You will find attached the stack traces of el= asticsearch.=C2=A0

Best = regards,

Othm= an BELHAJ.

On Mon, 4 Sep 2017 at 16:40, Beelz Ryuzaki <i93othman@gmail.com> wrote:<= br>
Hi Karl,

I'm sorry to bother on = your holiday. I will try to analyze it today and let it you know what I hav= e found. Enjoy your day !

Best regards,

Othman B= ELHAJ.

On Mon, 4 Sep 20= 17 at 16:06, Karl Wright <daddywri@gmail.com> wrote:
Hi Othman,

I won't be able to look a= t this today; it is a holiday here.=C2=A0 But, the "socket write"= error is coming from ElasticSearch.=C2=A0 If ES is configured to not accep= t documents greater than a certain size, that might explain it.=C2=A0 Maybe= the ES logs would help?

I'm afraid you're going to need to = do the work to find out what is going wrong in those cases now.
<= br>
Thanks,
Karl


On Mon, Sep 4, 2017 at 4:53= AM, Beelz Ryuzaki <i93othman@gmail.com> wrote:
Hi Karl,

This morning, I have tried the zookeeper based file a= nd it worked really good. However, I still have one error which is bugging = me. It is a socket write error. You will find attached the simple history r= eport. Surprisingly, I didn't have any stack trace in the ManifoldCF lo= g file.=C2=A0

Best regar= ds,

Othman.

On Fri, 1 S= ep 2017 at 19:39, Karl Wright <daddywri@gmail.com> wrote:
This is from file locking yet again.

= I have uploaded a new RC.=C2=A0 Please download and try out the zookeeper l= ocking.


Karl


On Fri, Sep 1, 2017 a= t 1:11 PM, Beelz Ryuzaki <i93othman@gmail.com> wrote:
There is another issue as well tha= t gives the following stack trace.

Othman.=C2=A0

On Fri, 1 Sep 2017= at 18:05, Beelz Ryuzaki <i93othman@gmail.com> wrote:
Hi Karl,=C2=A0
=
I took the binary from the ManifoldCF 2.8.1 RC0= . It had the version 3.9 of POI and when I changed the version to 3.15 it w= orked fine. I really want to try the zookeeper if as you told me its perfor= mance is better than the file-based example. For the time being, I'm us= ing the file-based because it is the only part that works for me but I actu= ally need a stable version for my production environment. That is one point= .=C2=A0
Another point is, the path's tab is stil= l an issue for me because I exclude some files and it still crawls them. I = want to exclude some specific extensions of files and some specific directo= ries. For instance, i don't want to index .exe files and contains a spe= cific word. I do as follows I make the first exclude with *.exe and the sec= ond one with *word*. Only the second one which doesn't work. How can I = solve this issue, please?

Thank you very much, have a nice week-end,

Othman=C2=A0
On Fri, 1 Sep 2017 at 16:46, Karl Wright <daddywri@gmail.com> wrote:
Hi Othman,

I will r= espin a new 2.8.1 (RC1) to address the zookeeper issue.

The failure you are seeing is "NoSuchMethodError".=C2=A0 Th= erefore, the class is being found, but it is the *wrong* class.=C2=A0 When = you deployed the new release, did you deploy it in a new directory, or did = you overwrite the previous deployment?=C2=A0 If you overwrote it, you proba= bly have multiple versions of the POI jars.

=
Karl


On Fri, Sep 1, 2017 at 9:59 AM, Beelz Ryuzaki <= i93othman@gmail.co= m> wrote:
Hi Karl,=C2=A0

I h= ave just tried the new release of ManifoldCF. At first, the first job ended= normally, but in the second I got a new stack trace concerning the POI. Mo= reover, the runzookeeper.bat doesn't run properly. It shows me the stac= k trace attached.

Ps:
The second attached file contains the POI stack trace.= =C2=A0

Othman.

On Fri, 1 Sep 2017 at 12:21, Karl Wright <daddywri@gmail.com> wrote:
Hi Othman,

Yo= u do not need a new database instance.

You can download MCF 2.8.1 RC= 0 from here:

https://dist.apache.org/re= pos/dist/dev/manifoldcf/apache-manifoldcf-2.8.1

Karl


On Fri, Sep 1, 2017 at 5:42 AM, Be= elz Ryuzaki <i93othman@gmail.com> wrote:
Hi Karl,

Thank you very much for your help, I'm going to try out th= e zookeeper example. Should I initialize a new database? And how can I run = the zookeeper start-agent ?=C2=A0

Othman.

On= Fri, 1 Sep 2017 at 11:37, Karl Wright <daddywri@gmail.com> wrote:
Hi Othman,

These exceptions = are now coming from file locking and are due to permissions problems.=C2=A0= I suggest you go to Zookeeper for file locking.

I= am building a 2.8.1 release candidate.=C2=A0 When it available for downloa= d, I'll send you the URL.

Thanks,
Ka= rl


On Fri, Sep 1, 2017 at 5:27 AM, Beelz Ryuzaki <i93othman@gmail.com&g= t; wrote:
H= i Karl,

This morning, I = have followed the steps you told me to do and I still got stack traces. I h= ave attached the stack traces as well as the content of my lib repo and opt= ion.env.
I have installed zookeeper and I'm read= y to use the zookeeper example. Could you guide through it? I don't kno= w if I follow the same steps in the file based example, I may not get stack= traces.=C2=A0

Thanks,
Othman=C2=A0

On Thu, 31 Aug 2017 at 18:19, Karl Wright <daddywri@gmail.com> wrote:
Please do the fol= lowing:

(0) Shut down all ManifoldCF processes.
(1) Move poi*.jar from connector-common-lib to lib.
(2) Move do= m4j*.jar from connector-common-lib to lib.
(3) Move commons-colle= ctions4*.jar from connector-common-lib to lib.
(4) Move xmlbeans*= .java from connector-common-lib to lib.
(5) Move curvesapi*.jar f= rom connector-common-lib to lib.
(6) Modify your options.env to i= nclude all of the jars you moved.
(7) Start up all ManifoldCF pro= cesses.
(8) If you still get stack traces, please send them to me= .

Karl


On Thu, Aug 31, 2017 at= 12:12 PM, Beelz Ryuzaki <i93othman@gmail.com> wrote:
Hi Karl,=C2=A0

By 'other place', do you mean the \= lib repository? If that so, then I have already tried it and it didn't = work.

Othman.
=

On Thu, 31 Aug 2017 = at 18:07, Karl Wright <daddywri@gmail.com> wrote:
Hi Othman,

I used the java dependency inspect= or to see what the issue is and it turns out that poi-ooxml.jar does refer = back to poi.jar in the class that is failing.=C2=A0 So you will need to mov= e poi-3.15.jar and commons-collections4-1.4.jar to the other place as well.=

Let's hope that finally fixes this issue.

I'm very u= nhappy about the quality of the POI project code; it is definitely not usin= g reasonable engineering practices, and I will be opening a ticket with the= m.

Thanks,
Karl


On Thu, Aug 31,= 2017 at 11:57 AM, Beelz Ryuzaki <i93othman@gmail.com> wrote:
I'm using the file bas= ed example and all the changes you told me to do. I reproduced them in the = file based example. I'll try to install zookeeper and use the zookeeper= example. Will I need a configuration to do in order to run the zookeeper e= xample ?=C2=A0

Othman.

=
On Thu, 31 Aug 2017 at 17:46, Karl Wright &= lt;daddywri@gmail.c= om> wrote:
Are you usin= g the zookeeper example, or the file-based example?

If t= hese jars have all been moved, and the options.env includes them, then I ha= ve to conclude that Apache POI's pom.xml is incorrect too.=C2=A0 It wil= l take a while to figure out what's missing that poi-ooxml.jar needs th= at is not listed.

Karl


On Thu,= Aug 31, 2017 at 11:39 AM, Beelz Ryuzaki <i93othman@gmail.com> wrote:<= br>
All the dependenci= es you mentioned have already been added in the options.env.win file in the= multiprocess-file-example repository.=C2=A0

On Thu, 31 Aug 2017 at 17:33, = Beelz Ryuzaki <= i93othman@gmail.com> wrote:
=
Yes, I added it in the options.env.win file. Should = it be the one in the multiprocess-zk-example document or multiprocess-file-= example ?=C2=A0

On Thu,= 31 Aug 2017 at 17:30, Karl Wright <daddywri@gmail.com> wrote:
It's not related at all to elasticsearch.
=
Karl


On Thu, Aug 31, 2017 at 11:26 AM, Beelz Ryuzaki <i93othman@g= mail.com> wrote:
Could it be a problem of elasticsearch's version ? I'= ;m actually using 2.1.0 which is pretty old for this new version of Manifol= dCF?

Othman.
<= div class=3D"m_-8126875982190186635m_-1620360871724343246m_-395409459188452= 582m_2749338186076634571m_3209744991144928421m_9010921768916279122m_-339575= 1830853616082m_-1374772683133734049m_-7201108876826610273m_-367462936850290= 6321m_4895647992103936006m_-2439079856376468138m_1793782932948109741m_81570= 82953058718552m_5161769189715751387m_-5132501650259208761m_-167387426483393= 35m_-4826131776308913726m_-1267896635751591070m_-4416853187548064002m_-2949= 71598032995355h5">
On Thu, 31 Aug 2017 a= t 17:23, Beelz Ryuzaki <i93othman@gmail.com> wrote:
I moved back both the jars you mentioned an= d a different is showing. You will find the stack trace attached.=C2=A0

Thanks,
Othman=C2=A0

=
On Thu, 31 Aug 2017 at 17:09, Karl Wright <daddywri@gmail.com> wrote:
=
I've looked at the dependencies; yo= u should not have moved poi-3.15.jar.=C2=A0 Please move that back, and comm= ons-collections4-4.1.jar too.

You *will* need to move cu= rvesapi-1.04.jar though.

Thanks,
Karl


On Thu, Aug 31, 2017 at 11:04 AM, Karl Wright <daddywri@gmail.com> wrote:
If you include poi.jar, th= en all dependencies of poi.jar must also be included.=C2=A0 This would mean= that=C2=A0curvesapi-1.04.jar and commons-collections4-4.1.jar should also = be included.

Karl<= /div>
<= br>
On Thu, Aug 31, 2017 at 10:23 AM, Beelz Ryuza= ki <i93ot= hman@gmail.com> wrote:
Hi Karl,=C2=A0

I added the two jars that you have mentioned and another one : po= i-3.15.jar . Unfortunately, there is another error showing. This time, it c= oncerns excel files. You will find attached the stack trace.=C2=A0

Othman.

= On Thu, 31 Aug 2017 at 15:32, Karl Wright <daddywri@gmail.com> wrote:
Hi Othman,

Yes, this shows that the j= ar we moved calls back into another jar, which will also need to be moved. = =C2=A0*That* jar has yet another dependency too.

The lis= t of jars is thus extended to include:

poi-ooxml-3= .15.jar
dom4j-1.6.1.jar

Karl<= /div>


On Thu, Aug 31, 2017 at 9:25 AM, Beelz Ryuzaki <i93othman@gmail.com&g= t; wrote:
Y= ou will find attached the stack trace. My apologies for the bad quality of = the image, I'm doing my best to send you the stack trace as I don't= have the right to send documents outside the company.

Thank you for your time,

Othman=C2=A0

On Thu, 31 Aug 2017 at 15:16, Karl Wright <daddywri@gmail.com&g= t; wrote:
Once again, I need a= stack trace to diagnose what the problem is.

Thanks,
Karl


On Thu, Aug 31, 2017 at 9:14 AM, Beelz Ryuzaki = <i93othman@gmai= l.com> wrote:
Oh, actually it didn't solve the problem. I looked into the l= og file and saw the following error:

Error tossed : org/apache/poi/POIXMLTypeLoader
java.lang.NoClassDefFoundError: org/apache/poi/POIXM= LTypeLoader.

Maybe anoth= er jar is missing ?

Othm= an.=C2=A0

On Thu, 31 Aug 2017 at 15:01, Beelz Ryuzaki <i93othman@gmail.com&g= t; wrote:
I = have tried what you told me to do, and you expected the crawling resumed. H= ow about the regular expressions? How can I make complex regular expression= s in the job's paths tab ?

Thank you very much for your help.

=
Othman.=C2=A0


On Thu, 31 Aug= 2017 at 14:47, Beelz Ryuzaki <i93othman@gmail.com> wrote:
Ok, I will try it right away and let= you know if it works.=C2=A0

Othman.

On Thu, 31= Aug 2017 at 14:15, Karl Wright <daddywri@gmail.com> wrote:
Oh, and you also may need to edit your options.env fil= es to include them in the classpath for startup.

<= div>Karl


On Thu, Aug 31, 2017 at 7:53 AM, Karl Wright <daddywri@gmail.com> wrote:
If you are amena= ble, there is another workaround you could try.=C2=A0 Specifically:

= (1) Shut down all MCF processes.
(2) Move the following two files from c= onnector-common-lib to lib:

xmlbeans-2.6.0.jar
poi-ooxml-= schemas-3.15.jar

(3) Restart everything and se= e if your crawl resumes.

Please let me know what h= appens.

Karl



On Thu, Aug 31, 2017 at 7:33 AM, Karl Wri= ght <daddy= wri@gmail.com> wrote:
= I created a ticket for this: CONNECTORS-1450.

One simple= workaround is to use the external Tika server transformer rather than the = embedded Tika Extractor.=C2=A0 I'm still looking into why the jar is no= t being found.

Karl
=


On Thu, Aug 31, 2017 at 7:08 AM, Beelz Ryuzaki <i93othman@gmail.com> wrote:
Yes, I'm actually using the latest binary version, and my job got stu= ck on that specific file.=C2=A0
The job status is st= ill Running. You can see it in the attached file. For your information, the= job started yesterday.=C2=A0

Thanks,=C2=A0

Othm= an

It looks like a dependency of Apache POI = is missing.
I think we will need a ticket to address this, if you are i= ndeed using the binary distribution.

Thanks!
=
Karl

On Thu, Aug 31, 2017 at 6:57 AM, Beelz Ryuzaki <<= a href=3D"mailto:i93othman@gmail.com" target=3D"_blank">i93othman@gmail.com= > wrote:
I'm actually using the binary version. For security reasons, I can= 't send any files from my computer. I have copied the stack trace and s= canned it with my cellphone. I hope it will be helpful. Meanwhile, I have r= ead the documentation about how to restrict the crawling and I don't th= ink the '|' works in the specified. For instance, I would like to r= estrict the crawling for the documents that counts the 'sound' word= . I proceed as follows: *(SON)* . the document is with capital letters and= I noticed that it didn't take it into consideration.=C2=A0

Thanks,=C2=A0
Othman



On Thu, 31 Aug 2017= at 12:40, Karl Wright <daddywri@gmail.com> wrote:
Hi Othman,

The way you restrict documents wi= th the windows share connector is by specifying information on the "Pa= ths" tab in jobs that crawl windows shares.=C2=A0 There is end-user do= cumentation both online and distributed with all binary distributions that = describe how to do this.=C2=A0 Have you found it?

=
Karl


On Thu, Aug 31, 2017 at 5:25 AM, Beelz Ryuzaki <i93othman@g= mail.com> wrote:
Hello Karl,=C2=A0

Thank you for your response, I will start using zookeeper and I w= ill let you know if it works. I have another question to ask. Actually, I n= eed to make some filters while crawling. I don't want to crawl some fil= es and some folders. Could you give me an example of how to use the regex. = Does the regex allow to use /i to ignore cases ?=C2=A0

Thanks,=C2=A0
Othman<= /div>

O= n Wed, 30 Aug 2017 at 19:53, Karl Wright <daddywri@gmail.com> wrote:
Hi Beelz,

File-based sync = is deprecated because people often have problems with getting file permissi= ons right, and they do not understand how to shut processes down cleanly, a= nd zookeeper is resilient against that.=C2=A0 I highly recommend using zook= eeper sync.

ManifoldCF is engineered to not put files into memory so= you do not need huge amounts of memory.=C2=A0 The default values are more = than enough for 35,000 files, which is a pretty small job for ManifoldCF.

Thanks,
Karl


On Wed, Aug 30, 201= 7 at 11:58 AM, Beelz Ryuzaki <i93othman@gmail.com> wrote:
I'm actually not using zoo= keeper. i want to know how is zookeeper different from file based sync? I a= lso need a guidance on how to manage my pc's memory. How many Go should= I allocate for the start-agent of ManifoldCF? Is 4Go enough in order to cr= awler 35K files ?

Othman= .=C2=A0

On Wed, 30 Aug 2017 at 16:= 11, Karl Wright <daddywri@gmail.com> wrote:
=
Your disk is not writable for some reason, and that's interfering = with ManifoldCF 2.8 locking.

I would suggest two things:=

(1) Use Zookeeper for sync instead of file-based = sync.
(2) Have a look if you still get failures after that.
=

Thanks,
Karl


On Wed, Aug 30, 2017 at= 9:37 AM, Beelz Ryuzaki <i93othman@gmail.com> wrote:
Hi Mr Karl,=C2=A0

Thank you Mr Karl for your quick response= . I have looked into the ManifoldCF log file and extracted the following wa= rnings :

- Attempt to se= t file lock 'D:\xxxx\apache_manifoldcf-2.8\multiprocess-file-examp= le\.\.\synch area\569\352\lock-_POOLTARGET_OUTPUTCONNECTORPOOL_ES= (Lowercase) Synapses.lock' failed : Access is denied.


- Couldn'= ;t write to lock file; disk may be full. Shutting down process; locks may b= e left dangling. You must cleanup before restarting.

ES (lowercase) synapses being the elasticsearc= h output connection. Moreover, the job uses Tika to extract metadata and a = file system as a repository connection. During the job, I don't extract= the content of the documents. I was wandering if the issue comes from elas= ticsearch ?

Othman.=C2= =A0

<= /div>


On Wed= , 30 Aug 2017 at 14:08, Karl Wright <daddywri@gmail.com> wrote:
Hi Othman,

ManifoldCF aborts a = job if there's an error that looks like it might go away on retry, but = does not.=C2=A0 It can be either on the repository side or on the output si= de.=C2=A0 If you look at the Simple History in the UI, or at the manifoldcf= .log file, you should be able to get a better sense of what went wrong.=C2= =A0 Without further information, I can't say any more.

Thanks,
Karl


On Wed, Aug 30, 2017 at 5:33 AM, B= eelz Ryuzaki <i93othman@gmail.com> wrote:
Hello,

I'm Othman Belhaj, a softwa= re engineer from soci=C3=A9t=C3=A9 g=C3=A9n=C3=A9rale in France. I'm ac= tually using your recent version of manifoldCF 2.8 . I'm working on an = internal search engine. For this reason, I'm using manifoldcf in order = to index documents on windows shares. I encountered a serious problem while= crawling 35K documents. Most of the time, when manifoldcf start crawling a= big sized documents (19Mo for example), it ends the job with the following= error: repeated service interruptions - failure processing document : soft= ware caused connection abort: socket write error.=C2=A0
Can you give me some tip= s on how to solve this problem, please ?=C2=A0

I use PostgreSQL 9.3.x and elasti= csearch 2.1.0 .
I'm looking forward for your response.

Best regards,=C2=A0
<= div dir=3D"auto" style=3D"color:rgb(49,49,49);word-spacing:1px">
<= div dir=3D"auto" style=3D"color:rgb(49,49,49);word-spacing:1px">Othman BELH= AJ








=














--001a1146ea90a0833805586ee1ba--