manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jetnet <jet...@gmail.com>
Subject 2 seconds delay when checking documents
Date Tue, 19 Jul 2016 10:33:02 GMT
hi All,

I've encountered recently an issue with the crawler (JCIFS connector): when
a jobs gets started, all it's documents are being checked, and this process
is taking too long. After tuning DEBUG on, I found, that there are ~2
seconds delay when processing the document queue:

e.g.:

DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - JCIFS: Leaving
wouldFileBeIncluded for 'smb://...
DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - Worker thread done
processing 1 documents
DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') -  Adding 1453999232278
to finishList
DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') -  Adding 1453999232278
to ingesterCheckList
DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - Finishing documents
{1453999232278 }
DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') -  Requeueing documents
due to carrydown {}
DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') -  Requeuing
{1453999232278 }
DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - Deleting {}
DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - Hopcount removal {}
DEBUG 2016-07-19 10:31:06,484 (Worker thread '61') - Rescanning documents {}
DEBUG 2016-07-19 10:31:06,500 (Stuffer thread) - Stuffer thread: Found 0
documents to queue
DEBUG 2016-07-19 10:31:06,750 (Document cleanup stuffer thread) - Document
cleanup stuffer thread woke up
DEBUG 2016-07-19 10:31:06,750 (Document delete stuffer thread) - Document
delete stuffer thread woke up
DEBUG 2016-07-19 10:31:06,750 (Document cleanup stuffer thread) - Document
cleanup stuffer thread found nothing to do
DEBUG 2016-07-19 10:31:06,750 (Document delete stuffer thread) - Document
delete stuffer thread found nothing to do
DEBUG 2016-07-19 10:31:07,375 (Set priority thread) - Done reprioritizing
because no more documents to reprioritize
DEBUG 2016-07-19 10:31:07,750 (Document cleanup stuffer thread) - Document
cleanup stuffer thread woke up
DEBUG 2016-07-19 10:31:07,750 (Document delete stuffer thread) - Document
delete stuffer thread woke up
DEBUG 2016-07-19 10:31:07,750 (Document delete stuffer thread) - Document
delete stuffer thread found nothing to do
DEBUG 2016-07-19 10:31:07,750 (Document cleanup stuffer thread) - Document
cleanup stuffer thread found nothing to do
DEBUG 2016-07-19 10:31:08,500 (Stuffer thread) - Document stuffer thread
woke up
DEBUG 2016-07-19 10:31:08,516 (Stuffer thread) - Stuffer thread: Found 2
documents to queue
DEBUG 2016-07-19 10:31:08,531 (Stuffer thread) - Document stuffer thread
woke up
DEBUG 2016-07-19 10:31:08,531 (Worker thread '62') - Worker thread
processing documents: 1453999191642
DEBUG 2016-07-19 10:31:08,531 (Worker thread '57') - Worker thread
processing documents: 1453999188326
DEBUG 2016-07-19 10:31:08,531 (Worker thread '62') - Worker thread starting
document count is 1
DEBUG 2016-07-19 10:31:08,531 (Worker thread '57') - Worker thread starting
document count is 1
DEBUG 2016-07-19 10:31:08,531 (Worker thread '62') - Post-relationship
document count is 1
DEBUG 2016-07-19 10:31:08,531 (Worker thread '57') - Post-relationship
document count is 1
DEBUG 2016-07-19 10:31:08,531 (Worker thread '62') -  Post-hopcount pruned
document count is 1
DEBUG 2016-07-19 10:31:08,531 (Worker thread '57') -  Post-hopcount pruned
document count is 1
DEBUG 2016-07-19 10:31:08,531 (Worker thread '62') - Worker thread about to
process {1453999191642 }
DEBUG 2016-07-19 10:31:08,531 (Worker thread '57') - Worker thread about to
process {1453999188326 }
DEBUG 2016-07-19 10:31:08,531 (Worker thread '62') - JCIFS: Processing
'smb://...


As one can see, there is a delay between the thread-61 has left the checks
and next 2 threads have started.
Any idea why that happens? Why the document stuffer thread does not wake up
immediately? Or - why the "document queue batch size" is only 2?

P.S. MCF version 2.3

Thanks!

--
Konstantin

Mime
View raw message