manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aeham Abushwashi <aeham.abushwa...@exonar.com>
Subject stuffamountfactor and getting more work done
Date Fri, 12 Dec 2014 12:11:29 GMT
Hi,

Are there any gotchas one should be aware of when configuring property
"org.apache.manifoldcf.crawler.stuffamountfactor"?

At times, I see the manifold nodes in my cluster (and the postgresql box)
not utilising all the resources they have. I have configured 30 worker
threads which tend to sit idle waiting for documents (continuous crawl).
This led me to tweak the batch size of the Stuffer thread indirectly using
"org.apache.manifoldcf.crawler.stuffamountfactor" and setting it to 20 (I
believe the default is 2).

I understand that increasing the batch size results in a bigger result set
coming back from the database. If the size is in the 1000s I doubt it would
cause problems. My hope is a bigger stuffer batch would allow worker
threads to operate more efficiently and handle more documents where
possible.

Please let me know if there are any particular concerns/guidelines over
tweaking this config property or if there are better ways for increasing
the width of the processing pipeline for each manifold instance.

Thanks,
Aeham

Mime
View raw message