incubator-connectors-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From conflue...@apache.org
Subject [CONF] Apache Connectors Framework > How to Build and Deploy ManifoldCF
Date Sat, 13 Nov 2010 15:45:00 GMT
Space: Apache Connectors Framework (https://cwiki.apache.org/confluence/display/CONNECTORS)
Page: How to Build and Deploy ManifoldCF (https://cwiki.apache.org/confluence/display/CONNECTORS/How+to+Build+and+Deploy+ManifoldCF)
Comment: https://cwiki.apache.org/confluence/display/CONNECTORS/How+to+Build+and+Deploy+ManifoldCF?focusedCommentId=24186268#comment-24186268

Comment added by Karl Wright:
---------------------------------------------------------------------

It is clear then that your attempt to set up PostgreSQL with 400 database handles did not
actually succeed, or my recommendation would not have helped.

The performance is still very poor compared with my very cheap system, but your disks now
look reasonably quick.  So let's try to figure out the problem.

(1) The default of 30 threads sounds low for your system.  I'd up this to 100.

(2) You don't want the maximum connections to be a bottleneck, either on the repository connection
side or on the output connection side.  Set the max connections for both to 105.

(3) Configure your postgreSQL to have at least 200 database handles available.  I
know you tried to do this already, but for some reason you configuration did not work.

(4) Set your properties.xml maximum database connections parameter to be 105, so that's not
a bottleneck either.


(5) You may want to give the JVM more memory than the default.  Perhaps you are garbage
collecting too much.  If you are still using the quick-start, just add appropriate
\-Xmx and \-Xms commands.  I'd start with 1024MB.

If none of this helps, then we can figure out what the bottleneck is by getting a Java thread
dump while the crawler is active.  How you do this depends on what operating system
you are using.  But it should be possible from that thread dump to get an idea where
all the threads are waiting.  Post it to connectors-user@incubator.apache.org.

Thanks,
Karl

In reply to a comment by Farzad:
Well my system is still running but very slow.  It's been almost 24 hours and it has only
crawled 243297 items.  This is on the 8 processor system with 10K disks (120 MB reads, and
200 MB writes).  I'll let it finish, should I try increasing connections and workers on the
next go around?

Change your notification preferences: https://cwiki.apache.org/confluence/users/viewnotifications.action

Mime
View raw message