manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Question about ManifoldCF 2.8
Date Mon, 18 Sep 2017 11:42:56 GMT
Unfortunately, I'll be offline all day today, starting in about 20 minutes,
so I don't have enough time to clarify the manual enough to help you out
here.  The UI, though, is pretty self-explanatory I would think.

Kalr


On Mon, Sep 18, 2017 at 6:22 AM, Beelz Ryuzaki <i93othman@gmail.com> wrote:

> Hi Karl,
>
> Thank you for this brief explanation, I will use adjuster eventually to
> tag the documents. Can you give an example of how to use the adjuster, say
> to tag documents that contains the word 'delivery'? I didn't understand
> well the example that was included in the documentation.
>
>
> Many thanks,
>
> Othman BELHAJ
>
> On Mon, 18 Sep 2017 at 12:15, Karl Wright <daddywri@gmail.com> wrote:
>
>> Hi Othman,
>>
>> What you do is add an attribute through the adjuster.  Then, in Solr or
>> Elastic Search, you add that attribute to your schema.  There is no concept
>> of "tagging" in these search engines.
>>
>> Karl
>>
>>
>> On Mon, Sep 18, 2017 at 5:53 AM, Beelz Ryuzaki <i93othman@gmail.com>
>> wrote:
>>
>>> I saw the documentation about "Metadata Adjuster" and I believe it
>>> doesn't add a new field "Tag". I actually have some documents that have the
>>> word "delivery" on either their title or subtitle or content, and I want to
>>> tag them as "delivery" in elasticsearch. Is it possible to do it with
>>> ManifoldCF? If so, how can I proceed ?
>>>
>>> Many thanks,
>>>
>>> Othman BELHAJ
>>>
>>> On Mon, 18 Sep 2017 at 11:32, Karl Wright <daddywri@gmail.com> wrote:
>>>
>>>> Hi Othman,
>>>>
>>>> You can add an attribute to all documents from any specific MCF job or
>>>> jobs by including a "Metadata Adjuster" in the pipeline for the job.  Hope
>>>> that answers your question?
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Mon, Sep 18, 2017 at 5:28 AM, Beelz Ryuzaki <i93othman@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello Karl,
>>>>>
>>>>> I'm interested in knowing if there is a way to tag the indexed
>>>>> documents with ManifoldCF ?
>>>>>
>>>>> Many thanks,
>>>>>
>>>>> Othman BELHAJ
>>>>>
>>>>> On Fri, 8 Sep 2017 at 21:43, Karl Wright <daddywri@gmail.com> wrote:
>>>>>
>>>>>> Hi Othman,
>>>>>>
>>>>>> There are two properties files for zookeeper: the global properties,
>>>>>> and the local (zookeeper managed) properties.  The database configuration
>>>>>> is in the zookeeper managed properties.
>>>>>>
>>>>>> Please examine the following page for setting up Postgresql
>>>>>> properties:
>>>>>>
>>>>>> https://manifoldcf.apache.org/release/release-2.8.1/en_US/
>>>>>> how-to-build-and-deploy.html
>>>>>>
>>>>>> Indexable files are files that those that the output connector says
>>>>>> can be indexed.  It's a function of the output connector and its
>>>>>> configuration.
>>>>>>
>>>>>> Thanks,
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Sep 8, 2017 at 2:07 PM, Beelz Ryuzaki <i93othman@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Sorry to bother you again, but what is the difference between
>>>>>>> indexable files and files in the path tab of a job ?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Othman BELHAJ
>>>>>>>
>>>>>>> On Fri, 8 Sep 2017 at 19:27, Beelz Ryuzaki <i93othman@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Karl,
>>>>>>>>
>>>>>>>> My zookeeper is still pointing to the HSQL database. What should I
>>>>>>>> do in order to change it so that it points to my PostgreSQL database ?
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>>
>>>>>>>> Othman Belhaj .
>>>>>>>>
>>>>>>>> On Wed, 6 Sep 2017 at 15:34, Beelz Ryuzaki <i93othman@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thank you, Karl. I will try to combine Postgresql with zookeeper
>>>>>>>>> and let you know.
>>>>>>>>>
>>>>>>>>> Othman.
>>>>>>>>>
>>>>>>>>> On Wed, 6 Sep 2017 at 13:18, Karl Wright <daddywri@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> No, you can use whatever supported database you like.
>>>>>>>>>>
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Sep 6, 2017 at 6:58 AM, Beelz Ryuzaki <
>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> As far as I know, when you use zookeeper , you obligatory need
>>>>>>>>>>> to use HSQLDB to go with it, right?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Othman
>>>>>>>>>>>
>>>>>>>>>>> On Wed, 6 Sep 2017 at 12:56, Karl Wright <daddywri@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>
>>>>>>>>>>>> HSQLDB stores all tables in memory so you need to size it
>>>>>>>>>>>> accordingly.  That is one reason we prefer Postgresql for production
>>>>>>>>>>>> deployments.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Karl
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Sep 6, 2017 at 6:21 AM, Beelz Ryuzaki <
>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I resolved the elasticsearch problem however the application
>>>>>>>>>>>>> doesn't seem to work after I have run a job to crawl over 500k documents. I
>>>>>>>>>>>>> get an GC overhead limit exceeded in the hsql database. How many should I
>>>>>>>>>>>>> allocate for it?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, 5 Sep 2017 at 12:43, Karl Wright <daddywri@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for doing the evaluation of the problem.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Generally, the ManifoldCF project does not have the expertise
>>>>>>>>>>>>>> to diagnose problems with external systems like Solr or Elasticsearch.  So
>>>>>>>>>>>>>> going to another newsgroup for those kinds of issues would be a good idea.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Sep 5, 2017 at 4:33 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have analyzed the error and found out that it was mainly
>>>>>>>>>>>>>>> an elasticsearch problem. I saw in some forums that one of the adopted
>>>>>>>>>>>>>>> solution is to modify elasticsearch.yml and set the http.max_content_length
>>>>>>>>>>>>>>> to a greater value. However, the job got stuck in the last two indexable
>>>>>>>>>>>>>>> files ( two pptx files with 22Mo and 2Mo respectively). The job eventually
>>>>>>>>>>>>>>> ended but a stack trace showed that elasticsearch ran out of memory. For
>>>>>>>>>>>>>>> your information, I have allocated 4Go for elasticsearch execution. Is it
>>>>>>>>>>>>>>> enough in order to have a good performance. You will find attached the
>>>>>>>>>>>>>>> stack traces of elasticsearch.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Othman BELHAJ.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, 4 Sep 2017 at 16:40, Beelz Ryuzaki <
>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'm sorry to bother on your holiday. I will try to analyze
>>>>>>>>>>>>>>>> it today and let it you know what I have found. Enjoy your day !
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Othman BELHAJ.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, 4 Sep 2017 at 16:06, Karl Wright <
>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I won't be able to look at this today; it is a holiday
>>>>>>>>>>>>>>>>> here.  But, the "socket write" error is coming from ElasticSearch.  If ES
>>>>>>>>>>>>>>>>> is configured to not accept documents greater than a certain size, that
>>>>>>>>>>>>>>>>> might explain it.  Maybe the ES logs would help?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'm afraid you're going to need to do the work to find out
>>>>>>>>>>>>>>>>> what is going wrong in those cases now.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Sep 4, 2017 at 4:53 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This morning, I have tried the zookeeper based file and
>>>>>>>>>>>>>>>>>> it worked really good. However, I still have one error which is bugging me.
>>>>>>>>>>>>>>>>>> It is a socket write error. You will find attached the simple history
>>>>>>>>>>>>>>>>>> report. Surprisingly, I didn't have any stack trace in the ManifoldCF log
>>>>>>>>>>>>>>>>>> file.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Fri, 1 Sep 2017 at 19:39, Karl Wright <
>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> This is from file locking yet again.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I have uploaded a new RC.  Please download and try out
>>>>>>>>>>>>>>>>>>> the zookeeper locking.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> https://dist.apache.org/repos/
>>>>>>>>>>>>>>>>>>> dist/dev/manifoldcf/apache-manifoldcf-2.8.1
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Fri, Sep 1, 2017 at 1:11 PM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> There is another issue as well that gives the following
>>>>>>>>>>>>>>>>>>>> stack trace.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Fri, 1 Sep 2017 at 18:05, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I took the binary from the ManifoldCF 2.8.1 RC0. It
>>>>>>>>>>>>>>>>>>>>> had the version 3.9 of POI and when I changed the version to 3.15 it worked
>>>>>>>>>>>>>>>>>>>>> fine. I really want to try the zookeeper if as you told me its performance
>>>>>>>>>>>>>>>>>>>>> is better than the file-based example. For the time being, I'm using the
>>>>>>>>>>>>>>>>>>>>> file-based because it is the only part that works for me but I actually
>>>>>>>>>>>>>>>>>>>>> need a stable version for my production environment. That is one point.
>>>>>>>>>>>>>>>>>>>>> Another point is, the path's tab is still an issue for
>>>>>>>>>>>>>>>>>>>>> me because I exclude some files and it still crawls them. I want to exclude
>>>>>>>>>>>>>>>>>>>>> some specific extensions of files and some specific directories. For
>>>>>>>>>>>>>>>>>>>>> instance, i don't want to index .exe files and contains a specific word. I
>>>>>>>>>>>>>>>>>>>>> do as follows I make the first exclude with *.exe and the second one with
>>>>>>>>>>>>>>>>>>>>> *word*. Only the second one which doesn't work. How can I solve this issue,
>>>>>>>>>>>>>>>>>>>>> please?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thank you very much, have a nice week-end,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>> On Fri, 1 Sep 2017 at 16:46, Karl Wright <
>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I will respin a new 2.8.1 (RC1) to address the
>>>>>>>>>>>>>>>>>>>>>> zookeeper issue.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> The failure you are seeing is "NoSuchMethodError".
>>>>>>>>>>>>>>>>>>>>>> Therefore, the class is being found, but it is the *wrong* class.  When you
>>>>>>>>>>>>>>>>>>>>>> deployed the new release, did you deploy it in a new directory, or did you
>>>>>>>>>>>>>>>>>>>>>> overwrite the previous deployment?  If you overwrote it, you probably have
>>>>>>>>>>>>>>>>>>>>>> multiple versions of the POI jars.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Fri, Sep 1, 2017 at 9:59 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I have just tried the new release of ManifoldCF. At
>>>>>>>>>>>>>>>>>>>>>>> first, the first job ended normally, but in the second I got a new stack
>>>>>>>>>>>>>>>>>>>>>>> trace concerning the POI. Moreover, the runzookeeper.bat doesn't run
>>>>>>>>>>>>>>>>>>>>>>> properly. It shows me the stack trace attached.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Ps:
>>>>>>>>>>>>>>>>>>>>>>> The second attached file contains the POI stack
>>>>>>>>>>>>>>>>>>>>>>> trace.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Fri, 1 Sep 2017 at 12:21, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> You do not need a new database instance.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> You can download MCF 2.8.1 RC0 from here:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> https://dist.apache.org/repos/
>>>>>>>>>>>>>>>>>>>>>>>> dist/dev/manifoldcf/apache-manifoldcf-2.8.1
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Sep 1, 2017 at 5:42 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Thank you very much for your help, I'm going to
>>>>>>>>>>>>>>>>>>>>>>>>> try out the zookeeper example. Should I initialize a new database? And how
>>>>>>>>>>>>>>>>>>>>>>>>> can I run the zookeeper start-agent ?
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, 1 Sep 2017 at 11:37, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> These exceptions are now coming from file locking
>>>>>>>>>>>>>>>>>>>>>>>>>> and are due to permissions problems.  I suggest you go to Zookeeper for
>>>>>>>>>>>>>>>>>>>>>>>>>> file locking.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> I am building a 2.8.1 release candidate.  When it
>>>>>>>>>>>>>>>>>>>>>>>>>> available for download, I'll send you the URL.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Sep 1, 2017 at 5:27 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> This morning, I have followed the steps you told
>>>>>>>>>>>>>>>>>>>>>>>>>>> me to do and I still got stack traces. I have attached the stack traces as
>>>>>>>>>>>>>>>>>>>>>>>>>>> well as the content of my lib repo and option.env.
>>>>>>>>>>>>>>>>>>>>>>>>>>> I have installed zookeeper and I'm ready to use
>>>>>>>>>>>>>>>>>>>>>>>>>>> the zookeeper example. Could you guide through it? I don't know if I follow
>>>>>>>>>>>>>>>>>>>>>>>>>>> the same steps in the file based example, I may not get stack traces.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 18:19, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please do the following:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> (0) Shut down all ManifoldCF processes.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> (1) Move poi*.jar from connector-common-lib to
>>>>>>>>>>>>>>>>>>>>>>>>>>>> lib.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> (2) Move dom4j*.jar from connector-common-lib
>>>>>>>>>>>>>>>>>>>>>>>>>>>> to lib.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> (3) Move commons-collections4*.jar from
>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector-common-lib to lib.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> (4) Move xmlbeans*.java from
>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector-common-lib to lib.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> (5) Move curvesapi*.jar from
>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector-common-lib to lib.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> (6) Modify your options.env to include all of
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the jars you moved.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> (7) Start up all ManifoldCF processes.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> (8) If you still get stack traces, please send
>>>>>>>>>>>>>>>>>>>>>>>>>>>> them to me.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 12:12 PM, Beelz Ryuzaki
>>>>>>>>>>>>>>>>>>>>>>>>>>>> <i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> By 'other place', do you mean the \lib
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> repository? If that so, then I have already tried it and it didn't work.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 18:07, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I used the java dependency inspector to see
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> what the issue is and it turns out that poi-ooxml.jar does refer back to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> poi.jar in the class that is failing.  So you will need to move
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> poi-3.15.jar and commons-collections4-1.4.jar to the other place as well.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Let's hope that finally fixes this issue.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm very unhappy about the quality of the POI
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> project code; it is definitely not using reasonable engineering practices,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and I will be opening a ticket with them.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:57 AM, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm using the file based example and all the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> changes you told me to do. I reproduced them in the file based example.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'll try to install zookeeper and use the zookeeper example. Will I need a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configuration to do in order to run the zookeeper example ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:46, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Are you using the zookeeper example, or the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> file-based example?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If these jars have all been moved, and the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> options.env includes them, then I have to conclude that Apache POI's
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pom.xml is incorrect too.  It will take a while to figure out what's
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> missing that poi-ooxml.jar needs that is not listed.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:39 AM, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> All the dependencies you mentioned have
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> already been added in the options.env.win file in the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> multiprocess-file-example repository.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:33, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, I added it in the options.env.win
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> file. Should it be the one in the multiprocess-zk-example document or
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> multiprocess-file-example ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:30, Karl Wright
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's not related at all to elasticsearch.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:26 AM, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Could it be a problem of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> elasticsearch's version ? I'm actually using 2.1.0 which is pretty old for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this new version of ManifoldCF?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:23, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryuzaki <i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I moved back both the jars you
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mentioned and a different is showing. You will find the stack trace
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> attached.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 17:09, Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I've looked at the dependencies; you
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should not have moved poi-3.15.jar.  Please move that back, and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> commons-collections4-4.1.jar too.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> You *will* need to move
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> curvesapi-1.04.jar though.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:04 AM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl Wright <daddywri@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If you include poi.jar, then all
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dependencies of poi.jar must also be included.  This would mean
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that curvesapi-1.04.jar and commons-collections4-4.1.jar should also be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> included.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 10:23 AM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beelz Ryuzaki <i93othman@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I added the two jars that you have
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mentioned and another one : poi-3.15.jar . Unfortunately, there is another
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> error showing. This time, it concerns excel files. You will find attached
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the stack trace.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:32, Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, this shows that the jar we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> moved calls back into another jar, which will also need to be moved.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  *That* jar has yet another dependency too.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The list of jars is thus extended
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to include:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> poi-ooxml-3.15.jar
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dom4j-1.6.1.jar
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:25 AM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beelz Ryuzaki <i93othman@gmail.com
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> You will find attached the stack
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> trace. My apologies for the bad quality of the image, I'm doing my best to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> send you the stack trace as I don't have the right to send documents
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> outside the company.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you for your time,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:16,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl Wright <daddywri@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Once again, I need a stack trace
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to diagnose what the problem is.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:14 AM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Oh, actually it didn't solve
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the problem. I looked into the log file and saw the following error:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Error tossed : org/apache/poi/
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> POIXMLTypeLoader
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> java.lang.NoClassDefFoundError:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> org/apache/poi/
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> POIXMLTypeLoader.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Maybe another jar is missing ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:01,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have tried what you told me
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to do, and you expected the crawling resumed. How about the regular
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> expressions? How can I make complex regular expressions in the job's paths
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tab ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you very much for your
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> help.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:47,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ok, I will try it right away
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and let you know if it works.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 14:15,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Oh, and you also may need to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> edit your options.env files to include them in the classpath for startup.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 7:53
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If you are amenable, there
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is another workaround you could try.  Specifically:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (1) Shut down all MCF
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> processes.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (2) Move the following two
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> files from connector-common-lib to lib:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> xmlbeans-2.6.0.jar
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> poi-ooxml-schemas-3.15.jar
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (3) Restart everything and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> see if your crawl resumes.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please let me know what
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> happens.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 7:33 AM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I created a ticket for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this: CONNECTORS-1450.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> One simple workaround is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to use the external Tika server transformer rather than the embedded Tika
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Extractor.  I'm still looking into why the jar is not being found.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 7:08 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, I'm actually using
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the latest binary version, and my job got stuck on that specific file.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The job status is still
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Running. You can see it in the attached file. For your information, the job
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> started yesterday.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 13:04, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It looks like a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dependency of Apache POI is missing.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think we will need a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ticket to address this, if you are indeed using the binary distribution.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 6:57 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm actually using the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> binary version. For security reasons, I can't send any files from my
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> computer. I have copied the stack trace and scanned it with my cellphone. I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hope it will be helpful. Meanwhile, I have read the documentation about how
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to restrict the crawling and I don't think the '|' works in the specified.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For instance, I would like to restrict the crawling for the documents that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> counts the 'sound' word . I proceed as follows: *(SON)* . the document is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with capital letters and I noticed that it didn't take it into
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consideration.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 12:40, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The way you restrict
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> documents with the windows share connector is by specifying information on
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the "Paths" tab in jobs that crawl windows shares.  There is end-user
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> documentation both online and distributed with all binary distributions
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that describe how to do this.  Have you found it?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> at 5:25 AM, Beelz Ryuzaki
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <i93othman@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you for your
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> response, I will start using zookeeper and I will let you know if it works.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have another question to ask. Actually, I need to make some filters while
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> crawling. I don't want to crawl some files and some folders. Could you give
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me an example of how to use the regex. Does the regex allow to use /i to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ignore cases ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> at 19:53, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Beelz,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> File-based sync is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deprecated because people often have problems with getting file permissions
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> right, and they do not understand how to shut processes down cleanly, and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zookeeper is resilient against that.  I highly recommend using zookeeper
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sync.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> engineered to not put files into memory so you do not need huge amounts of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> memory.  The default values are more than enough for 35,000 files, which is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a pretty small job for ManifoldCF.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30, 2017
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> at 11:58 AM, Beelz Ryuzaki
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <i93othman@gmail.com
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm actually not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> using zookeeper. i want to know how is zookeeper different from file based
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sync? I also need a guidance on how to manage my pc's memory. How many Go
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should I allocate for the start-agent of ManifoldCF? Is 4Go enough in order
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to crawler 35K files ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug 2017
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> at 16:11, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Your disk is not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> writable for some reason, and that's interfering with ManifoldCF 2.8
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> locking.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would suggest
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> two things:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (1) Use Zookeeper
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for sync instead of file-based sync.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (2) Have a look if
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you still get failures after that.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2017 at 9:37 AM, Beelz Ryuzaki
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Mr Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you Mr Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for your quick response. I have looked into the ManifoldCF log file and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> extracted the following warnings :
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Attempt to set
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> file lock 'D:\xxxx\apache_manifoldcf-2.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 8\multiprocess-file-example\.\.\synch
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> area\569\352\lock-_POOLTARGET_OUTPUTCONNECTORPOOL_ES
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Lowercase) Synapses.lock' failed : Access is denied.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Couldn't write
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to lock file; disk may be full. Shutting down process; locks may be left
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dangling. You must cleanup before restarting.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ES (lowercase)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> synapses being the elasticsearch output connection. Moreover, the job uses
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Tika to extract metadata and a file system as a repository connection.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> During the job, I don't extract the content of the documents. I was
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wandering if the issue comes from elasticsearch ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 Aug
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2017 at 14:08, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> aborts a job if there's an error that looks like it might go away on retry,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but does not.  It can be either on the repository side or on the output
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> side.  If you look at the Simple History in the UI, or at the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> manifoldcf.log file, you should be able to get a better sense of what went
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrong.  Without further information, I can't say any more.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 30,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2017 at 5:33 AM, Beelz Ryuzaki
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm Othman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Belhaj, a software engineer from société générale in France. I'm actually
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> using your recent version of manifoldCF 2.8 . I'm working on an internal
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> search engine. For this reason, I'm using manifoldcf in order to index
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> documents on windows shares. I encountered a serious problem while crawling
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 35K documents. Most of the time, when manifoldcf start crawling a big sized
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> documents (19Mo for example), it ends the job with the following error:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> repeated service interruptions - failure processing document : software
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caused connection abort: socket write error.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Can you give me
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> some tips on how to solve this problem, please ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I use
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PostgreSQL 9.3.x and elasticsearch 2.1.0 .
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm looking
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> forward for your response.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Othman BELHAJ
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>
>>>>
>>

Mime
View raw message