manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roman Šitina <ro...@sitina.cz>
Subject Re: Detailed monitoring of jobs / job stuck
Date Mon, 17 Aug 2015 21:09:03 GMT
Thank you very much, that helped!

Is it ok to put lockclean call in our startup script just to make
sure? And is it worth to go for Zookeeper version?

Thanks again
Roman

On 17 August 2015 at 22:55, Karl Wright <daddywri@gmail.com> wrote:
> I would try executing the lock clean procedure.  Shut down all ManifoldCF
> processes and web applications, then run the LockClean script, then start
> them back up again.  If you have shut any processes down with kill -9, then
> you may have locks hanging around.
>
> Karl
>
>
> On Mon, Aug 17, 2015 at 4:34 PM, Roman Šitina <roman@sitina.cz> wrote:
>>
>> It is multiprocess setup with file synchronisation.
>>
>> I can see reprioritisation in logs and after a while all I can see are
>> these logs cycling:
>>
>> DEBUG 2015-08-17 20:27:19,980 (Expire stuffer thread) -
>> org.apache.manifoldcf.crawlerthreads - Expiration stuffer thread woke
>> up
>>
>> DEBUG 2015-08-17 20:27:19,981 (Expire stuffer thread) -
>> org.apache.manifoldcf.perf - Beginning query to look for documents to
>> expire
>>
>> DEBUG 2015-08-17 20:27:19,981 (Expire stuffer thread) -
>> org.apache.manifoldcf.perf -  Attempt 1 to expire documents, after 0
>> ms
>>
>> DEBUG 2015-08-17 20:27:19,983 (Expire stuffer thread) -
>> org.apache.manifoldcf.perf -  Expiring 0 documents
>>
>> DEBUG 2015-08-17 20:27:19,984 (Expire stuffer thread) -
>> org.apache.manifoldcf.crawlerthreads - Expiration stuffer thread:
>> Found 0 documents to expire
>>
>> DEBUG 2015-08-17 20:27:19,996 (Expire stuffer thread) -
>> org.apache.manifoldcf.crawlerthreads - Expiration stuffer thread woke
>> up
>>
>> DEBUG 2015-08-17 20:27:19,996 (Expire stuffer thread) -
>> org.apache.manifoldcf.perf - Beginning query to look for documents to
>> expire
>>
>> DEBUG 2015-08-17 20:27:19,997 (Expire stuffer thread) -
>> org.apache.manifoldcf.perf -  Attempt 1 to expire documents, after 1
>> ms
>>
>> DEBUG 2015-08-17 20:27:19,999 (Expire stuffer thread) -
>> org.apache.manifoldcf.perf -  Expiring 0 documents
>>
>> DEBUG 2015-08-17 20:27:19,999 (Expire stuffer thread) -
>> org.apache.manifoldcf.crawlerthreads - Expiration stuffer thread:
>> Found 0 documents to expire
>>
>> DEBUG 2015-08-17 20:27:20,077 (Document cleanup stuffer thread) -
>> org.apache.manifoldcf.crawlerthreads - Document cleanup stuffer thread
>> woke up
>>
>> DEBUG 2015-08-17 20:27:20,077 (Document delete stuffer thread) -
>> org.apache.manifoldcf.crawlerthreads - Document delete stuffer thread
>> woke up
>>
>> DEBUG 2015-08-17 20:27:20,078 (Document cleanup stuffer thread) -
>> org.apache.manifoldcf.crawlerthreads - Document cleanup stuffer thread
>> found nothing to do
>>
>> DEBUG 2015-08-17 20:27:20,078 (Document delete stuffer thread) -
>> org.apache.manifoldcf.crawlerthreads - Document delete stuffer thread
>> found nothing to do
>>
>> DEBUG 2015-08-17 20:27:20,083 (Document delete stuffer thread) -
>> org.apache.manifoldcf.crawlerthreads - Document delete stuffer thread
>> woke up
>>
>> DEBUG 2015-08-17 20:27:20,083 (Document cleanup stuffer thread) -
>> org.apache.manifoldcf.crawlerthreads - Document cleanup stuffer thread
>> woke up
>>
>> DEBUG 2015-08-17 20:27:20,084 (Document delete stuffer thread) -
>> org.apache.manifoldcf.crawlerthreads - Document delete stuffer thread
>> found nothing to do
>>
>> DEBUG 2015-08-17 20:27:20,084 (Document cleanup stuffer thread) -
>> org.apache.manifoldcf.crawlerthreads - Document cleanup stuffer thread
>> found nothing to do
>>
>> DEBUG 2015-08-17 20:27:21,078 (Document cleanup stuffer thread) -
>> org.apache.manifoldcf.crawlerthreads - Document cleanup stuffer thread
>> woke up
>>
>>
>>
>> On 17 August 2015 at 21:29, Karl Wright <daddywri@gmail.com> wrote:
>> > 2.1 does do background reprioritization.  If you want to see that
>> > occurring
>> > in the log, you would need to add the following in your properties.xml
>> > file:
>> >
>> > <property name="org.apache.manifoldcf.scheduling" value="DEBUG"/>
>> >
>> > Can I have more information?  Specifically, is this a multiprocess
>> > setup?
>> > and if so, is this zookeeper or file system synchronization?
>> >
>> > Karl
>> >
>> >
>> > On Mon, Aug 17, 2015 at 2:57 PM, Roman Šitina <roman@sitina.cz> wrote:
>> >>
>> >> Hello Karl,
>> >>
>> >> thanks for you quick reply!
>> >>
>> >> The version is 2.1. I tried to get detailed logging by setting
>> >> log4j.rootLogger=INFO, MAIN in logging.ini but that did not help -
>> >> only WARN level was still logging after restart.
>> >>
>> >> Roman
>> >>
>> >> On 17 August 2015 at 20:35, Karl Wright <daddywri@gmail.com> wrote:
>> >> > Hi Roman,
>> >> >
>> >> > ManifoldCF needs to reprioritize documents whenever you pause or
>> >> > restart
>> >> > jobs.  For jobs with large numbers of documents, the total amount of
>> >> > work
>> >> > involved in this is significant.  But, depending on the precise
>> >> > ManifoldCF
>> >> > version you are using, the reprioritization typically continues in
>> >> > background while MCF runs your job.
>> >> >
>> >> > Can you tell me more about what version of MCF you are trying here?
>> >> >
>> >> > Karl
>> >> >
>> >> >
>> >> > On Mon, Aug 17, 2015 at 2:13 PM, Roman Šitina <sitina@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> Hello,
>> >> >>
>> >> >> I have a ManifoldCF setup based on multiprocess-file-example which
>> >> >> is
>> >> >> backed by PostgreSQL.
>> >> >>
>> >> >> I have created a connection from Documentum to ElasticSearch with
>> >> >> about 300 000 documents. I was able to crawl several thousand
>> >> >> documents so the connection is working properly.
>> >> >>
>> >> >> What I'm not sure about is that when I pause or stop the job and
>> >> >> then
>> >> >> run it again it takes a while and it looks like ManifoldCF is doing
>> >> >> nothing (30 minutes). After that time I usually try to restart
all
>> >> >> processes.
>> >> >>
>> >> >> I looked at all logs - manifoldcf.log, documentum-registry,
>> >> >> documentum-server and DFC itself but I can't find any relevant
>> >> >> information.
>> >> >>
>> >> >> Can you help me figuring out what is the best way to monitor
>> >> >> progress
>> >> >> of jobs that look to be not progressing?
>> >> >>
>> >> >> Thank you very much
>> >> >> Roman
>> >> >
>> >> >
>> >
>> >
>
>

Mime
View raw message