jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcel Reutegger <marcel.reuteg...@gmx.net>
Subject Re: tuning SearchIndex
Date Tue, 22 Nov 2005 08:43:38 GMT
Hi Brian,

Brian Moseley wrote:
> even more astonishing, 1051 of those open fds are index files:
> java    12405 root   40r   REG        9,1    22608 22875403 
> /home/cosmo-demo-roots/prod7/data/repository/workspaces/homedir/index/_0/_2y.cfs 
> java    12405 root   41r   REG        9,1     2856 22875406 
> /home/cosmo-demo-roots/prod7/data/repository/workspaces/homedir/index/_1/_8.cfs 
> java    12405 root   42r   REG        9,1     2291 22875409 
> /home/cosmo-demo-roots/prod7/data/repository/workspaces/homedir/index/_2/_8.cfs 
> java    12405 root   43r   REG        9,1      888 22940607 
> /home/cosmo-demo-roots/prod7/data/repository/workspaces/homedir/index/_3/_1.cfs 

how many folders do you see under homedir/index ?

with a well configured index the amount should rarely exceed 20 sub 
index directories.

> i don't know anything about lucene, but after looking at MultiIndex, i 
> wonder if i'm having an issue with the frequency that the volatile index 
> is persisted and/or the the persistent indexes are merged. i'm using the 
> default SearchIndex configuration, that is to say:
>         <SearchIndex 
> class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>             <param name="useCompoundFile" value="true"/>
>             <param name="minMergeDocs" value="1000"/>
>             <param name="volatileIdleTime" value="3"/>
>             <param name="maxMergeDocs" value="1000"/>
>             <param name="mergeFactor" value="10"/>
>             <param name="bufferSize" value="10"/>
>             <param name="path" value="${wsp.home}/index"/>
>         </SearchIndex>

two parameters do not match the default value you find in 

- minMergeDocs (default 100)
- maxMergeDocs (default 100000)

The two parameters are relevant for the incremental merge behaviour of 
the index.

I suggest that you try the default values, the index will probably 
create less index files.

some more background info on the search index and its merge behaviour 
which affects open files:

an index consists of several sub indexes that combined in a multi index. 
  there is always one sub index that is held in memory (volatile index) 
and a number of persistent indexes on disk. new persistent indexes are 
created when (1) the volatile index reaches a certain size, which is 
controlled by minMergeDocs or (2) the whole index had been idle for a 
certain time, configured by volatileIdleTime.
Increasing the value for minMergeDocs will use more memory because more 
nodes are kept in the volatile index. But a higher value will also 
increase performance for bulk loads. The drawback is, that queries are a 
bit slower.

persistent indexes are merged by a background thread. this process is 
controlled by three parameters: minMergeDocs, maxMergeDocs and mergeFactor

as mentioned before merging is done incrementally. several smaller 
indexes are merged into a larger index. imagine the following boxes:

   -----    -----    -----
   | A |    | B |    | C |  ...
   -----    -----    -----

Box A contains sub indexes with size <= minMergeDoc^(1*mergeFactor)
Box B contains sub indexes with size <= minMergeDoc^(2*mergeFactor)
Box C contains sub indexes with size <= minMergeDoc^(3*mergeFactor)

and so on.

as soons as a box contains a number of sub indexes equal to mergeFactor 
they are merged and put into the next box. the sub indexes from the 
source box are then deleted. the upper limit is controlled by 
maxMergeDocs. the merging process will never merge more than 
maxMergeDocs. Thus the number of boxes is limited.


View raw message