manifoldcf-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From conflue...@apache.org
Subject [CONF] Apache Connectors Framework > FAQ
Date Tue, 12 Oct 2010 21:32:00 GMT
Space: Apache Connectors Framework (https://cwiki.apache.org/confluence/display/CONNECTORS)
Page: FAQ (https://cwiki.apache.org/confluence/display/CONNECTORS/FAQ)
Comment: https://cwiki.apache.org/confluence/display/CONNECTORS/FAQ?focusedCommentId=24182792#comment-24182792

Comment added by Karl Wright:
---------------------------------------------------------------------

The sample I used was some 30,000 documents.

Several effects come into play for larger, more extended crawls.  PostgreSQL accumulates
"dead tuples" over time, which impact performance.  There is a procedure for cleaning
this up, which I believe is documented in the "Build and Deploy" page, involving a VACUUM
FULL operation.

Second, if you use PostgreSQL's configuration out of the box, you are likely getting a background
VACUUM operation starting at some point during your crawl.  This background-process
vacuum is insufficient to keep up with dead tuple accumulation and only serves to slow things
down.  So turn "autovacuum" to OFF.  This is also mentioned in the build-and-deploy
page.

Third, ManifoldCF itself periodically asks PostgreSQL to reindex data, which can have an overall
impact on performance.  The time at which it performs this activity is every
100,000 inserts/modifies to the queue.  That is obviously more than the size crawl
I ran.

Hope this answers your question.

In reply to a comment by Farzad:
How big was the file share you crawled?  I have 280,000 files spread across a lot of directories.
 It starts out 29-31 docs / sec, but as it crawls it gets slower.  For example, at 98000,
it was doing 31 docs a second, at 203000 it is doing 16 docs a second.  So I'm just curious
how long have you been able to sustain the ~30 doc/sec.

Change your notification preferences: https://cwiki.apache.org/confluence/users/viewnotifications.action

Mime
View raw message