jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Moseley <...@osafoundation.org>
Subject tuning SearchIndex
Date Mon, 21 Nov 2005 18:03:02 GMT
while testing my caldav server, i've had a lot of seemingly arbitrary 
exceptions that i've tracked down to the jvm running out of file 
descriptors.

after using ulimit to give the jvm 10k fds, i've found that the server 
seems to hit equilibrium at almost 1200 open fds, an astonishing amount. 
the exact number from the last run is 1178.

even more astonishing, 1051 of those open fds are index files:

java    12405 root   40r   REG        9,1    22608 22875403 
/home/cosmo-demo-roots/prod7/data/repository/workspaces/homedir/index/_0/_2y.cfs
java    12405 root   41r   REG        9,1     2856 22875406 
/home/cosmo-demo-roots/prod7/data/repository/workspaces/homedir/index/_1/_8.cfs
java    12405 root   42r   REG        9,1     2291 22875409 
/home/cosmo-demo-roots/prod7/data/repository/workspaces/homedir/index/_2/_8.cfs
java    12405 root   43r   REG        9,1      888 22940607 
/home/cosmo-demo-roots/prod7/data/repository/workspaces/homedir/index/_3/_1.cfs

etc etc etc ad nauseum.

the test i'm conducting simulates the initial publication of a 
moderately-sized (500+ event) calendar. it does well over a thousand PUT 
requests, each of which adds an nt:file (plus caldav:resource mixin 
type) to the repository to store the uploaded file's contents. before 
each node is added, the server performs a query against the parent node, 
looking something like this (where XXXXXX is a bit of metadata about the 
uploaded file):

   /jcr:root/bcm/calendar//element(*, caldav:resource)[@caldav:uid = 
'XXXXXX']

publication of calendars with this many events will likely happen 
infrequently, as individual users are added to the server, although when 
an instance of the server is first brought online, there will be a heavy 
wave of users publishing their calendars to the server for the first time.

i don't know anything about lucene, but after looking at MultiIndex, i 
wonder if i'm having an issue with the frequency that the volatile index 
is persisted and/or the the persistent indexes are merged. i'm using the 
default SearchIndex configuration, that is to say:

         <SearchIndex 
class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
             <param name="useCompoundFile" value="true"/>
             <param name="minMergeDocs" value="1000"/>
             <param name="volatileIdleTime" value="3"/>
             <param name="maxMergeDocs" value="1000"/>
             <param name="mergeFactor" value="10"/>
             <param name="bufferSize" value="10"/>
             <param name="path" value="${wsp.home}/index"/>
         </SearchIndex>

does anybody have advice on how to tune the SearchIndex? or am i barking 
up the wrong tree altogether? are there other subsystems that will be 
affected by this pattern of rapid writes in large quantities?


Mime
View raw message