manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: 2 seconds delay when checking documents
Date Wed, 20 Jul 2016 20:08:46 GMT
Hi,

Can you provide more details about your setup, etc.?  Specifically:

- what database is this?
- how many jobs are there?
- are there any jobs that have lots of documents that are in an unusual
state?

The query that the stuffer thread runs is not magic, and if it returns only
a couple of documents every now and then there must be a reason.  I do not
see the kind of behavior you are describing here.

Karl


On Wed, Jul 20, 2016 at 3:35 PM, jetnet <jetnet@gmail.com> wrote:

> hi Karl,
>
> actually, there are no failed documents at all. All documents (without
> exception) have status "Waiting for Processing" and the job is hanging
> around with status "Running", but nothing happens.
> When the job gets restarted, then all docs become status "About to
> Process", and then slowly (2 docs / 2 seconds) moving to "Waiting for
> Processing" again.
> I could delete the job and re-create it again, but I'd like to understand
> the root cause of the issue.
>
> Any hints where I could look else very much appreciated!
> Thanks!
>
> 2016-07-19 15:31 GMT+02:00 Karl Wright <daddywri@gmail.com>:
>
>> Hi Konstantin,
>>
>> It's not possible to figure out from the log why the stuffer thread
>> queued only two documents at that time.  Probably it's because there were
>> no other eligible documents.  Documents in the queue have a priority and
>> they also have a minimum time they can be acted upon.  If you have
>> documents that failed earlier, they get requeued depending on the
>> particular kind of failure (up to the connector) with specific times.
>> There's no way to make the queue resolve "faster" if it's full of documents
>> that were requeued in this way due to errors.
>>
>> You can figure all this out by using the Document Status and Queue Status
>> reports.
>>
>> Karl
>>
>>
>> On Tue, Jul 19, 2016 at 7:29 AM, jetnet <jetnet@gmail.com> wrote:
>>
>>> Hi Karl,
>>>
>>> the documentation states: "it [Document Stuffer Thread] stuffs some
>>> number that is a
>>> multiple of the number of Worker Threads at one time, up to some limit
>>> (which is also
>>> related to the number of Worker Threads)."
>>>
>>> here is the worker threads config:
>>>
>>>   <property name="org.apache.manifoldcf.database.maxhandles"
>>> value="200"/>
>>>   <property name="org.apache.manifoldcf.crawler.threads" value="100"/>
>>>
>>>
>>> So, I'd expect, that the stuffer thread would put about 200 docs in the
>>> queue, but it puts only 2...
>>>
>>> DEBUG 2016-07-19 13:24:02,916 (Stuffer thread) - Document stuffer thread
>>> woke up
>>> DEBUG 2016-07-19 13:24:02,963 (Stuffer thread) - Stuffing document
>>> 'smb://...' that has priority 10.284489143231797 onto active list
>>> DEBUG 2016-07-19 13:24:02,963 (Stuffer thread) - Stuffing document
>>> 'smb://...' that has priority 10.28520622168017 onto active list
>>> DEBUG 2016-07-19 13:24:02,963 (Set priority thread) - Reprioritizing 1
>>> documents
>>> DEBUG 2016-07-19 13:24:02,979 (Set priority thread) - Reprioritizing 1
>>> documents
>>> DEBUG 2016-07-19 13:24:02,979 (Stuffer thread) - Setting new minimum
>>> depth value to 29273.99867577113
>>> DEBUG 2016-07-19 13:24:02,979 (Stuffer thread) - Stuffer thread: Found 2
>>> documents to queue
>>> DEBUG 2016-07-19 13:24:02,979 (Set priority thread) - Reprioritizing 1
>>> documents
>>> DEBUG 2016-07-19 13:24:02,979 (Stuffer thread) - Putting document
>>> 'smb://...' with bins [SERVER ] onto active queue
>>> DEBUG 2016-07-19 13:24:02,979 (Stuffer thread) - Putting document
>>> 'smb://...' with bins [SERVER ] onto active queue
>>> DEBUG 2016-07-19 13:24:02,979 (Stuffer thread) - Document stuffer thread
>>> woke up
>>>
>>> Or did I miss something?
>>>
>>> Thanks!
>>>
>>>
>>> 2016-07-19 12:51 GMT+02:00 Karl Wright <daddywri@gmail.com>:
>>>
>>>> Hi Konstantin,
>>>>
>>>> The stuffer thread operates independently of the worker threads.  It
>>>> wakes up, and if anything is available, stuffs as much as it can, and when
>>>> done sleeps for a short time.  The queue is maintained at at least twice
>>>> the number of active worker threads.  See ManifoldCF in Action.
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Tue, Jul 19, 2016 at 6:33 AM, jetnet <jetnet@gmail.com> wrote:
>>>>
>>>>> hi All,
>>>>>
>>>>> I've encountered recently an issue with the crawler (JCIFS connector):
>>>>> when a jobs gets started, all it's documents are being checked, and this
>>>>> process is taking too long. After tuning DEBUG on, I found, that there
are
>>>>> ~2 seconds delay when processing the document queue:
>>>>>
>>>>> e.g.:
>>>>>
>>>>> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - JCIFS: Leaving
>>>>> wouldFileBeIncluded for 'smb://...
>>>>> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - Worker thread
>>>>> done processing 1 documents
>>>>> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') -  Adding
>>>>> 1453999232278 to finishList
>>>>> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') -  Adding
>>>>> 1453999232278 to ingesterCheckList
>>>>> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - Finishing
>>>>> documents {1453999232278 }
>>>>> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') -  Requeueing
>>>>> documents due to carrydown {}
>>>>> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') -  Requeuing
>>>>> {1453999232278 }
>>>>> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - Deleting {}
>>>>> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - Hopcount removal
>>>>> {}
>>>>> DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - Rescanning
>>>>> documents {}
>>>>> DEBUG 2016-07-19 10:31:06,500 (Stuffer thread) - Stuffer thread: Found
>>>>> 0 documents to queue
>>>>> DEBUG 2016-07-19 10:31:06,750 (Document cleanup stuffer thread) -
>>>>> Document cleanup stuffer thread woke up
>>>>> DEBUG 2016-07-19 10:31:06,750 (Document delete stuffer thread) -
>>>>> Document delete stuffer thread woke up
>>>>> DEBUG 2016-07-19 10:31:06,750 (Document cleanup stuffer thread) -
>>>>> Document cleanup stuffer thread found nothing to do
>>>>> DEBUG 2016-07-19 10:31:06,750 (Document delete stuffer thread) -
>>>>> Document delete stuffer thread found nothing to do
>>>>> DEBUG 2016-07-19 10:31:07,375 (Set priority thread) - Done
>>>>> reprioritizing because no more documents to reprioritize
>>>>> DEBUG 2016-07-19 10:31:07,750 (Document cleanup stuffer thread) -
>>>>> Document cleanup stuffer thread woke up
>>>>> DEBUG 2016-07-19 10:31:07,750 (Document delete stuffer thread) -
>>>>> Document delete stuffer thread woke up
>>>>> DEBUG 2016-07-19 10:31:07,750 (Document delete stuffer thread) -
>>>>> Document delete stuffer thread found nothing to do
>>>>> DEBUG 2016-07-19 10:31:07,750 (Document cleanup stuffer thread) -
>>>>> Document cleanup stuffer thread found nothing to do
>>>>> DEBUG 2016-07-19 10:31:08,500 (Stuffer thread) - Document stuffer
>>>>> thread woke up
>>>>> DEBUG 2016-07-19 10:31:08,516 (Stuffer thread) - Stuffer thread: Found
>>>>> 2 documents to queue
>>>>> DEBUG 2016-07-19 10:31:08,531 (Stuffer thread) - Document stuffer
>>>>> thread woke up
>>>>> DEBUG 2016-07-19 10:31:08,531 (Worker thread '62') - Worker thread
>>>>> processing documents: 1453999191642
>>>>> DEBUG 2016-07-19 10:31:08,531 (Worker thread '57') - Worker thread
>>>>> processing documents: 1453999188326
>>>>> DEBUG 2016-07-19 10:31:08,531 (Worker thread '62') - Worker thread
>>>>> starting document count is 1
>>>>> DEBUG 2016-07-19 10:31:08,531 (Worker thread '57') - Worker thread
>>>>> starting document count is 1
>>>>> DEBUG 2016-07-19 10:31:08,531 (Worker thread '62') - Post-relationship
>>>>> document count is 1
>>>>> DEBUG 2016-07-19 10:31:08,531 (Worker thread '57') - Post-relationship
>>>>> document count is 1
>>>>> DEBUG 2016-07-19 10:31:08,531 (Worker thread '62') -  Post-hopcount
>>>>> pruned document count is 1
>>>>> DEBUG 2016-07-19 10:31:08,531 (Worker thread '57') -  Post-hopcount
>>>>> pruned document count is 1
>>>>> DEBUG 2016-07-19 10:31:08,531 (Worker thread '62') - Worker thread
>>>>> about to process {1453999191642 }
>>>>> DEBUG 2016-07-19 10:31:08,531 (Worker thread '57') - Worker thread
>>>>> about to process {1453999188326 }
>>>>> DEBUG 2016-07-19 10:31:08,531 (Worker thread '62') - JCIFS: Processing
>>>>> 'smb://...
>>>>>
>>>>>
>>>>> As one can see, there is a delay between the thread-61 has left the
>>>>> checks and next 2 threads have started.
>>>>> Any idea why that happens? Why the document stuffer thread does not
>>>>> wake up immediately? Or - why the "document queue batch size" is only
2?
>>>>>
>>>>> P.S. MCF version 2.3
>>>>>
>>>>> Thanks!
>>>>>
>>>>> --
>>>>> Konstantin
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message