manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Beelz Ryuzaki <i93oth...@gmail.com>
Subject Re: Question about ManifoldCF 2.8
Date Thu, 31 Aug 2017 09:25:13 GMT
Hello Karl,

Thank you for your response, I will start using zookeeper and I will let
you know if it works. I have another question to ask. Actually, I need to
make some filters while crawling. I don't want to crawl some files and some
folders. Could you give me an example of how to use the regex. Does the
regex allow to use /i to ignore cases ?

Thanks,
Othman

On Wed, 30 Aug 2017 at 19:53, Karl Wright <daddywri@gmail.com> wrote:

> Hi Beelz,
>
> File-based sync is deprecated because people often have problems with
> getting file permissions right, and they do not understand how to shut
> processes down cleanly, and zookeeper is resilient against that.  I highly
> recommend using zookeeper sync.
>
> ManifoldCF is engineered to not put files into memory so you do not need
> huge amounts of memory.  The default values are more than enough for 35,000
> files, which is a pretty small job for ManifoldCF.
>
> Thanks,
> Karl
>
>
> On Wed, Aug 30, 2017 at 11:58 AM, Beelz Ryuzaki <i93othman@gmail.com>
> wrote:
>
>> I'm actually not using zookeeper. i want to know how is zookeeper
>> different from file based sync? I also need a guidance on how to manage my
>> pc's memory. How many Go should I allocate for the start-agent of
>> ManifoldCF? Is 4Go enough in order to crawler 35K files ?
>>
>> Othman.
>>
>> On Wed, 30 Aug 2017 at 16:11, Karl Wright <daddywri@gmail.com> wrote:
>>
>>> Your disk is not writable for some reason, and that's interfering with
>>> ManifoldCF 2.8 locking.
>>>
>>> I would suggest two things:
>>>
>>> (1) Use Zookeeper for sync instead of file-based sync.
>>> (2) Have a look if you still get failures after that.
>>>
>>> Thanks,
>>> Karl
>>>
>>>
>>> On Wed, Aug 30, 2017 at 9:37 AM, Beelz Ryuzaki <i93othman@gmail.com>
>>> wrote:
>>>
>>>> Hi Mr Karl,
>>>>
>>>> Thank you Mr Karl for your quick response. I have looked into the
>>>> ManifoldCF log file and extracted the following warnings :
>>>>
>>>> - Attempt to set file lock
>>>> 'D:\xxxx\apache_manifoldcf-2.8\multiprocess-file-example\.\.\synch
>>>> area\569\352\lock-_POOLTARGET_OUTPUTCONNECTORPOOL_ES (Lowercase)
>>>> Synapses.lock' failed : Access is denied.
>>>>
>>>>
>>>> - Couldn't write to lock file; disk may be full. Shutting down process;
>>>> locks may be left dangling. You must cleanup before restarting.
>>>>
>>>> ES (lowercase) synapses being the elasticsearch output connection.
>>>> Moreover, the job uses Tika to extract metadata and a file system as a
>>>> repository connection. During the job, I don't extract the content of the
>>>> documents. I was wandering if the issue comes from elasticsearch ?
>>>>
>>>> Othman.
>>>>
>>>>
>>>>
>>>> On Wed, 30 Aug 2017 at 14:08, Karl Wright <daddywri@gmail.com> wrote:
>>>>
>>>>> Hi Othman,
>>>>>
>>>>> ManifoldCF aborts a job if there's an error that looks like it might
>>>>> go away on retry, but does not.  It can be either on the repository side
or
>>>>> on the output side.  If you look at the Simple History in the UI, or
at the
>>>>> manifoldcf.log file, you should be able to get a better sense of what
went
>>>>> wrong.  Without further information, I can't say any more.
>>>>>
>>>>> Thanks,
>>>>> Karl
>>>>>
>>>>>
>>>>> On Wed, Aug 30, 2017 at 5:33 AM, Beelz Ryuzaki <i93othman@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I'm Othman Belhaj, a software engineer from société générale
in
>>>>>> France. I'm actually using your recent version of manifoldCF 2.8
. I'm
>>>>>> working on an internal search engine. For this reason, I'm using
manifoldcf
>>>>>> in order to index documents on windows shares. I encountered a serious
>>>>>> problem while crawling 35K documents. Most of the time, when manifoldcf
>>>>>> start crawling a big sized documents (19Mo for example), it ends
the job
>>>>>> with the following error: repeated service interruptions - failure
>>>>>> processing document : software caused connection abort: socket write
error.
>>>>>> Can you give me some tips on how to solve this problem, please ?
>>>>>>
>>>>>> I use PostgreSQL 9.3.x and elasticsearch 2.1.0 .
>>>>>> I'm looking forward for your response.
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Othman BELHAJ
>>>>>>
>>>>>
>>>>>
>>>
>

Mime
View raw message