manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Beelz Ryuzaki <i93oth...@gmail.com>
Subject Re: Question about ManifoldCF 2.8
Date Fri, 01 Sep 2017 09:27:00 GMT
Hi Karl,

This morning, I have followed the steps you told me to do and I still got
stack traces. I have attached the stack traces as well as the content of my
lib repo and option.env.
I have installed zookeeper and I'm ready to use the zookeeper example.
Could you guide through it? I don't know if I follow the same steps in the
file based example, I may not get stack traces.

Thanks,
Othman

On Thu, 31 Aug 2017 at 18:19, Karl Wright <daddywri@gmail.com> wrote:

> Please do the following:
>
> (0) Shut down all ManifoldCF processes.
> (1) Move poi*.jar from connector-common-lib to lib.
> (2) Move dom4j*.jar from connector-common-lib to lib.
> (3) Move commons-collections4*.jar from connector-common-lib to lib.
> (4) Move xmlbeans*.java from connector-common-lib to lib.
> (5) Move curvesapi*.jar from connector-common-lib to lib.
> (6) Modify your options.env to include all of the jars you moved.
> (7) Start up all ManifoldCF processes.
> (8) If you still get stack traces, please send them to me.
>
> Karl
>
>
> On Thu, Aug 31, 2017 at 12:12 PM, Beelz Ryuzaki <i93othman@gmail.com>
> wrote:
>
>> Hi Karl,
>>
>> By 'other place', do you mean the \lib repository? If that so, then I
>> have already tried it and it didn't work.
>>
>> Othman.
>>
>> On Thu, 31 Aug 2017 at 18:07, Karl Wright <daddywri@gmail.com> wrote:
>>
>>> Hi Othman,
>>>
>>> I used the java dependency inspector to see what the issue is and it
>>> turns out that poi-ooxml.jar does refer back to poi.jar in the class that
>>> is failing.  So you will need to move poi-3.15.jar and
>>> commons-collections4-1.4.jar to the other place as well.
>>>
>>> Let's hope that finally fixes this issue.
>>>
>>> I'm very unhappy about the quality of the POI project code; it is
>>> definitely not using reasonable engineering practices, and I will be
>>> opening a ticket with them.
>>>
>>> Thanks,
>>> Karl
>>>
>>>
>>> On Thu, Aug 31, 2017 at 11:57 AM, Beelz Ryuzaki <i93othman@gmail.com>
>>> wrote:
>>>
>>>> I'm using the file based example and all the changes you told me to do.
>>>> I reproduced them in the file based example. I'll try to install zookeeper
>>>> and use the zookeeper example. Will I need a configuration to do in order
>>>> to run the zookeeper example ?
>>>>
>>>> Othman.
>>>>
>>>> On Thu, 31 Aug 2017 at 17:46, Karl Wright <daddywri@gmail.com> wrote:
>>>>
>>>>> Are you using the zookeeper example, or the file-based example?
>>>>>
>>>>> If these jars have all been moved, and the options.env includes them,
>>>>> then I have to conclude that Apache POI's pom.xml is incorrect too. 
It
>>>>> will take a while to figure out what's missing that poi-ooxml.jar needs
>>>>> that is not listed.
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>> On Thu, Aug 31, 2017 at 11:39 AM, Beelz Ryuzaki <i93othman@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> All the dependencies you mentioned have already been added in the
>>>>>> options.env.win file in the multiprocess-file-example repository.
>>>>>>
>>>>>> On Thu, 31 Aug 2017 at 17:33, Beelz Ryuzaki <i93othman@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Yes, I added it in the options.env.win file. Should it be the
one in
>>>>>>> the multiprocess-zk-example document or multiprocess-file-example
?
>>>>>>>
>>>>>>> On Thu, 31 Aug 2017 at 17:30, Karl Wright <daddywri@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> It's not related at all to elasticsearch.
>>>>>>>> Karl
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Aug 31, 2017 at 11:26 AM, Beelz Ryuzaki <
>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Could it be a problem of elasticsearch's version ? I'm
actually
>>>>>>>>> using 2.1.0 which is pretty old for this new version
of ManifoldCF?
>>>>>>>>>
>>>>>>>>> Othman.
>>>>>>>>>
>>>>>>>>> On Thu, 31 Aug 2017 at 17:23, Beelz Ryuzaki <i93othman@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I moved back both the jars you mentioned and a different
is
>>>>>>>>>> showing. You will find the stack trace attached.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Othman
>>>>>>>>>>
>>>>>>>>>> On Thu, 31 Aug 2017 at 17:09, Karl Wright <daddywri@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I've looked at the dependencies; you should not
have moved
>>>>>>>>>>> poi-3.15.jar.  Please move that back, and commons-collections4-4.1.jar
too.
>>>>>>>>>>>
>>>>>>>>>>> You *will* need to move curvesapi-1.04.jar though.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Aug 31, 2017 at 11:04 AM, Karl Wright
<
>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> If you include poi.jar, then all dependencies
of poi.jar must
>>>>>>>>>>>> also be included.  This would mean that curvesapi-1.04.jar
and
>>>>>>>>>>>> commons-collections4-4.1.jar should also
be included.
>>>>>>>>>>>>
>>>>>>>>>>>> Karl
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Aug 31, 2017 at 10:23 AM, Beelz Ryuzaki
<
>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I added the two jars that you have mentioned
and another one :
>>>>>>>>>>>>> poi-3.15.jar . Unfortunately, there is
another error showing. This time, it
>>>>>>>>>>>>> concerns excel files. You will find attached
the stack trace.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:32, Karl Wright
<daddywri@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Othman,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes, this shows that the jar we moved
calls back into another
>>>>>>>>>>>>>> jar, which will also need to be moved.
 *That* jar has yet another
>>>>>>>>>>>>>> dependency too.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The list of jars is thus extended
to include:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> poi-ooxml-3.15.jar
>>>>>>>>>>>>>> dom4j-1.6.1.jar
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:25 AM,
Beelz Ryuzaki <
>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You will find attached the stack
trace. My apologies for the
>>>>>>>>>>>>>>> bad quality of the image, I'm
doing my best to send you the stack trace as
>>>>>>>>>>>>>>> I don't have the right to send
documents outside the company.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thank you for your time,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Othman
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at 15:16,
Karl Wright <
>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Once again, I need a stack
trace to diagnose what the
>>>>>>>>>>>>>>>> problem is.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Aug 31, 2017 at 9:14
AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>> i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Oh, actually it didn't
solve the problem. I looked into
>>>>>>>>>>>>>>>>> the log file and saw
the following error:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Error tossed : org/apache/poi/POIXMLTypeLoader
>>>>>>>>>>>>>>>>> java.lang.NoClassDefFoundError:
>>>>>>>>>>>>>>>>> org/apache/poi/POIXMLTypeLoader.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Maybe another jar is
missing ?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017 at
15:01, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>> i93othman@gmail.com>
wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I have tried what
you told me to do, and you expected the
>>>>>>>>>>>>>>>>>> crawling resumed.
How about the regular expressions? How can I make complex
>>>>>>>>>>>>>>>>>> regular expressions
in the job's paths tab ?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thank you very much
for your help.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, 31 Aug 2017
at 14:47, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>> i93othman@gmail.com>
wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Ok, I will try
it right away and let you know if it
>>>>>>>>>>>>>>>>>>> works.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Othman.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Thu, 31 Aug
2017 at 14:15, Karl Wright <
>>>>>>>>>>>>>>>>>>> daddywri@gmail.com>
wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Oh, and you
also may need to edit your options.env
>>>>>>>>>>>>>>>>>>>> files to
include them in the classpath for startup.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Thu, Aug
31, 2017 at 7:53 AM, Karl Wright <
>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com>
wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> If you
are amenable, there is another workaround you
>>>>>>>>>>>>>>>>>>>>> could
try.  Specifically:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> (1) Shut
down all MCF processes.
>>>>>>>>>>>>>>>>>>>>> (2) Move
the following two files from
>>>>>>>>>>>>>>>>>>>>> connector-common-lib
to lib:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> xmlbeans-2.6.0.jar
>>>>>>>>>>>>>>>>>>>>> poi-ooxml-schemas-3.15.jar
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> (3) Restart
everything and see if your crawl resumes.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Please
let me know what happens.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Thu,
Aug 31, 2017 at 7:33 AM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com>
wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I
created a ticket for this: CONNECTORS-1450.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> One
simple workaround is to use the external Tika
>>>>>>>>>>>>>>>>>>>>>> server
transformer rather than the embedded Tika Extractor.  I'm still
>>>>>>>>>>>>>>>>>>>>>> looking
into why the jar is not being found.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On
Thu, Aug 31, 2017 at 7:08 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>> i93othman@gmail.com>
wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
Yes, I'm actually using the latest binary version,
>>>>>>>>>>>>>>>>>>>>>>>
and my job got stuck on that specific file.
>>>>>>>>>>>>>>>>>>>>>>>
The job status is still Running. You can see it in
>>>>>>>>>>>>>>>>>>>>>>>
the attached file. For your information, the job started yesterday.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
Thanks,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
Othman
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
On Thu, 31 Aug 2017 at 13:04, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>
daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
It looks like a dependency of Apache POI is missing.
>>>>>>>>>>>>>>>>>>>>>>>>
I think we will need a ticket to address this, if
>>>>>>>>>>>>>>>>>>>>>>>>
you are indeed using the binary distribution.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
Thanks!
>>>>>>>>>>>>>>>>>>>>>>>>
Karl
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
On Thu, Aug 31, 2017 at 6:57 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>
i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
I'm actually using the binary version. For
>>>>>>>>>>>>>>>>>>>>>>>>>
security reasons, I can't send any files from my computer. I have copied
>>>>>>>>>>>>>>>>>>>>>>>>>
the stack trace and scanned it with my cellphone. I hope it will be
>>>>>>>>>>>>>>>>>>>>>>>>>
helpful. Meanwhile, I have read the documentation about how to restrict the
>>>>>>>>>>>>>>>>>>>>>>>>>
crawling and I don't think the '|' works in the specified. For instance, I
>>>>>>>>>>>>>>>>>>>>>>>>>
would like to restrict the crawling for the documents that counts the
>>>>>>>>>>>>>>>>>>>>>>>>>
'sound' word . I proceed as follows: *(SON)* . the document is with capital
>>>>>>>>>>>>>>>>>>>>>>>>>
letters and I noticed that it didn't take it into consideration.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>
Othman
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
On Thu, 31 Aug 2017 at 12:40, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>
daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
The way you restrict documents with the windows
>>>>>>>>>>>>>>>>>>>>>>>>>>
share connector is by specifying information on the "Paths" tab in jobs
>>>>>>>>>>>>>>>>>>>>>>>>>>
that crawl windows shares.  There is end-user documentation both online and
>>>>>>>>>>>>>>>>>>>>>>>>>>
distributed with all binary distributions that describe how to do this.
>>>>>>>>>>>>>>>>>>>>>>>>>>
Have you found it?
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
On Thu, Aug 31, 2017 at 5:25 AM, Beelz Ryuzaki <
>>>>>>>>>>>>>>>>>>>>>>>>>>
i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
Hello Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
Thank you for your response, I will start using
>>>>>>>>>>>>>>>>>>>>>>>>>>>
zookeeper and I will let you know if it works. I have another question to
>>>>>>>>>>>>>>>>>>>>>>>>>>>
ask. Actually, I need to make some filters while crawling. I don't want to
>>>>>>>>>>>>>>>>>>>>>>>>>>>
crawl some files and some folders. Could you give me an example of how to
>>>>>>>>>>>>>>>>>>>>>>>>>>>
use the regex. Does the regex allow to use /i to ignore cases ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
Othman
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
On Wed, 30 Aug 2017 at 19:53, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>
daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Hi Beelz,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
File-based sync is deprecated because people
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
often have problems with getting file permissions right, and they do not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
understand how to shut processes down cleanly, and zookeeper is resilient
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
against that.  I highly recommend using zookeeper sync.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
ManifoldCF is engineered to not put files into
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
memory so you do not need huge amounts of memory.  The default values are
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
more than enough for 35,000 files, which is a pretty small job for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
ManifoldCF.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
On Wed, Aug 30, 2017 at 11:58 AM, Beelz Ryuzaki
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
<i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
I'm actually not using zookeeper. i want to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
know how is zookeeper different from file based sync? I also need a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
guidance on how to manage my pc's memory. How many Go should I allocate for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
the start-agent of ManifoldCF? Is 4Go enough in order to crawler 35K files ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
On Wed, 30 Aug 2017 at 16:11, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Your disk is not writable for some reason,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
and that's interfering with ManifoldCF 2.8 locking.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
I would suggest two things:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
(1) Use Zookeeper for sync instead of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
file-based sync.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
(2) Have a look if you still get failures
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
after that.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
On Wed, Aug 30, 2017 at 9:37 AM, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Ryuzaki <i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Hi Mr Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Thank you Mr Karl for your quick response. I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
have looked into the ManifoldCF log file and extracted the following
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
warnings :
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- Attempt to set file lock
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
'D:\xxxx\apache_manifoldcf-2.8\multiprocess-file-example\.\.\synch
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
area\569\352\lock-_POOLTARGET_OUTPUTCONNECTORPOOL_ES (Lowercase)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Synapses.lock' failed : Access is denied.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
- Couldn't write to lock file; disk may be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
full. Shutting down process; locks may be left dangling. You must cleanup
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
before restarting.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
ES (lowercase) synapses being the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
elasticsearch output connection. Moreover, the job uses Tika to extract
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
metadata and a file system as a repository connection. During the job, I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
don't extract the content of the documents. I was wandering if the issue
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
comes from elasticsearch ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Othman.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
On Wed, 30 Aug 2017 at 14:08, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Hi Othman,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
ManifoldCF aborts a job if there's an error
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
that looks like it might go away on retry, but does not.  It can be either
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
on the repository side or on the output side.  If you look at the Simple
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
History in the UI, or at the manifoldcf.log file, you should be able to get
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
a better sense of what went wrong.  Without further information, I can't
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
say any more.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
On Wed, Aug 30, 2017 at 5:33 AM, Beelz
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Ryuzaki <i93othman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Hello,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
I'm Othman Belhaj, a software engineer
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
from société générale in France. I'm actually using your recent version of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
manifoldCF 2.8 . I'm working on an internal search engine. For this reason,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
I'm using manifoldcf in order to index documents on windows shares. I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
encountered a serious problem while crawling 35K documents. Most of the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
time, when manifoldcf start crawling a big sized documents (19Mo for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
example), it ends the job with the following error: repeated service
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
interruptions - failure processing document : software caused connection
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
abort: socket write error.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Can you give me some tips on how to solve
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
this problem, please ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
I use PostgreSQL 9.3.x and elasticsearch
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2.1.0 .
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
I'm looking forward for your response.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Best regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Othman BELHAJ
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>
>>>>>
>>>
>

Mime
View raw message