manifoldcf-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From conflue...@apache.org
Subject [CONF] Apache Connectors Framework > FAQ
Date Sat, 09 Oct 2010 07:01:00 GMT
Space: Apache Connectors Framework (https://cwiki.apache.org/confluence/display/CONNECTORS)
Page: FAQ (https://cwiki.apache.org/confluence/display/CONNECTORS/FAQ)
Comment: https://cwiki.apache.org/confluence/display/CONNECTORS/FAQ?focusedCommentId=23340302#comment-23340302

Comment added by Karl Wright:
---------------------------------------------------------------------

With PostgreSQL, a somewhat different test set than I used in May, and with a no-doubt much
more fragmented disk, I am getting some 17 documents/second here, now, doing a file-system
crawl to a null output.  Which is 1/2 what I saw in May.

This had the following special postgresql settings:
(1) 100 max connection handles
(2) 256MB shared buffers (which may well have been overkill, but that's what my PostgreSQL
setup had)

Connection/job settings:
(1) 100 max connections of both repository amd output connections.
(2) Hop filters set to "never delete unreachable documents".

System was pretty near totally I/O bound during execution, which leads me to believe that,
since the system was brand-new in May, disk fragmentation was a major factor.  I will try
to run a benchmark where the database is on a different disk than the files being crawled,
maybe today.


In reply to a comment by Karl Wright:
I'm not sure what yet the problem might be under PostgreSQL - I haven't yet tried that.
However, I did try to do a performance run under Derby.  Derby runs well for a time but then
stalls - it seems to wind up with that same internal deadlock I coded around before.  It doesn't
error out at that point anymore, but since it stalls for an entire minute before it detects
the deadlock, you lose a lot of time and performance as a result.

I'm not quite sure what to do about about this issue with Derby.  Maybe I can lower the deadlock
timeout threshold; I'll have to think about that for a while.  I'll look at PostgreSQL next.


Change your notification preferences: https://cwiki.apache.org/confluence/users/viewnotifications.action

Mime
View raw message