manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jetnet <jet...@gmail.com>
Subject Re: 2 seconds delay when checking documents
Date Tue, 19 Jul 2016 11:29:24 GMT
Hi Karl,

the documentation states: "it [Document Stuffer Thread] stuffs some number
that is a
multiple of the number of Worker Threads at one time, up to some limit
(which is also
related to the number of Worker Threads)."

here is the worker threads config:

  <property name="org.apache.manifoldcf.database.maxhandles" value="200"/>
  <property name="org.apache.manifoldcf.crawler.threads" value="100"/>


So, I'd expect, that the stuffer thread would put about 200 docs in the
queue, but it puts only 2...

DEBUG 2016-07-19 13:24:02,916 (Stuffer thread) - Document stuffer thread
woke up
DEBUG 2016-07-19 13:24:02,963 (Stuffer thread) - Stuffing document
'smb://...' that has priority 10.284489143231797 onto active list
DEBUG 2016-07-19 13:24:02,963 (Stuffer thread) - Stuffing document
'smb://...' that has priority 10.28520622168017 onto active list
DEBUG 2016-07-19 13:24:02,963 (Set priority thread) - Reprioritizing 1
documents
DEBUG 2016-07-19 13:24:02,979 (Set priority thread) - Reprioritizing 1
documents
DEBUG 2016-07-19 13:24:02,979 (Stuffer thread) - Setting new minimum depth
value to 29273.99867577113
DEBUG 2016-07-19 13:24:02,979 (Stuffer thread) - Stuffer thread: Found 2
documents to queue
DEBUG 2016-07-19 13:24:02,979 (Set priority thread) - Reprioritizing 1
documents
DEBUG 2016-07-19 13:24:02,979 (Stuffer thread) - Putting document
'smb://...' with bins [SERVER ] onto active queue
DEBUG 2016-07-19 13:24:02,979 (Stuffer thread) - Putting document
'smb://...' with bins [SERVER ] onto active queue
DEBUG 2016-07-19 13:24:02,979 (Stuffer thread) - Document stuffer thread
woke up

Or did I miss something?

Thanks!


2016-07-19 12:51 GMT+02:00 Karl Wright <daddywri@gmail.com>:

> Hi Konstantin,
>
> The stuffer thread operates independently of the worker threads.  It wakes
> up, and if anything is available, stuffs as much as it can, and when done
> sleeps for a short time.  The queue is maintained at at least twice the
> number of active worker threads.  See ManifoldCF in Action.
>
> Karl
>
>
> On Tue, Jul 19, 2016 at 6:33 AM, jetnet <jetnet@gmail.com> wrote:
>
>> hi All,
>>
>> I've encountered recently an issue with the crawler (JCIFS connector):
>> when a jobs gets started, all it's documents are being checked, and this
>> process is taking too long. After tuning DEBUG on, I found, that there are
>> ~2 seconds delay when processing the document queue:
>>
>> e.g.:
>>
>> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - JCIFS: Leaving
>> wouldFileBeIncluded for 'smb://...
>> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - Worker thread done
>> processing 1 documents
>> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') -  Adding
>> 1453999232278 to finishList
>> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') -  Adding
>> 1453999232278 to ingesterCheckList
>> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - Finishing documents
>> {1453999232278 }
>> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') -  Requeueing
>> documents due to carrydown {}
>> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') -  Requeuing
>> {1453999232278 }
>> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - Deleting {}
>> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - Hopcount removal {}
>> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - Rescanning documents
>> {}
>> DEBUG 2016-07-19 10:31:06,500 (Stuffer thread) - Stuffer thread: Found 0
>> documents to queue
>> DEBUG 2016-07-19 10:31:06,750 (Document cleanup stuffer thread) -
>> Document cleanup stuffer thread woke up
>> DEBUG 2016-07-19 10:31:06,750 (Document delete stuffer thread) - Document
>> delete stuffer thread woke up
>> DEBUG 2016-07-19 10:31:06,750 (Document cleanup stuffer thread) -
>> Document cleanup stuffer thread found nothing to do
>> DEBUG 2016-07-19 10:31:06,750 (Document delete stuffer thread) - Document
>> delete stuffer thread found nothing to do
>> DEBUG 2016-07-19 10:31:07,375 (Set priority thread) - Done reprioritizing
>> because no more documents to reprioritize
>> DEBUG 2016-07-19 10:31:07,750 (Document cleanup stuffer thread) -
>> Document cleanup stuffer thread woke up
>> DEBUG 2016-07-19 10:31:07,750 (Document delete stuffer thread) - Document
>> delete stuffer thread woke up
>> DEBUG 2016-07-19 10:31:07,750 (Document delete stuffer thread) - Document
>> delete stuffer thread found nothing to do
>> DEBUG 2016-07-19 10:31:07,750 (Document cleanup stuffer thread) -
>> Document cleanup stuffer thread found nothing to do
>> DEBUG 2016-07-19 10:31:08,500 (Stuffer thread) - Document stuffer thread
>> woke up
>> DEBUG 2016-07-19 10:31:08,516 (Stuffer thread) - Stuffer thread: Found 2
>> documents to queue
>> DEBUG 2016-07-19 10:31:08,531 (Stuffer thread) - Document stuffer thread
>> woke up
>> DEBUG 2016-07-19 10:31:08,531 (Worker thread '62') - Worker thread
>> processing documents: 1453999191642
>> DEBUG 2016-07-19 10:31:08,531 (Worker thread '57') - Worker thread
>> processing documents: 1453999188326
>> DEBUG 2016-07-19 10:31:08,531 (Worker thread '62') - Worker thread
>> starting document count is 1
>> DEBUG 2016-07-19 10:31:08,531 (Worker thread '57') - Worker thread
>> starting document count is 1
>> DEBUG 2016-07-19 10:31:08,531 (Worker thread '62') - Post-relationship
>> document count is 1
>> DEBUG 2016-07-19 10:31:08,531 (Worker thread '57') - Post-relationship
>> document count is 1
>> DEBUG 2016-07-19 10:31:08,531 (Worker thread '62') -  Post-hopcount
>> pruned document count is 1
>> DEBUG 2016-07-19 10:31:08,531 (Worker thread '57') -  Post-hopcount
>> pruned document count is 1
>> DEBUG 2016-07-19 10:31:08,531 (Worker thread '62') - Worker thread about
>> to process {1453999191642 }
>> DEBUG 2016-07-19 10:31:08,531 (Worker thread '57') - Worker thread about
>> to process {1453999188326 }
>> DEBUG 2016-07-19 10:31:08,531 (Worker thread '62') - JCIFS: Processing
>> 'smb://...
>>
>>
>> As one can see, there is a delay between the thread-61 has left the
>> checks and next 2 threads have started.
>> Any idea why that happens? Why the document stuffer thread does not wake
>> up immediately? Or - why the "document queue batch size" is only 2?
>>
>> P.S. MCF version 2.3
>>
>> Thanks!
>>
>> --
>> Konstantin
>>
>
>

Mime
View raw message