jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcel Reutegger <marcel.reuteg...@gmx.net>
Subject Re: Problems with re-index a huge repository
Date Thu, 10 Aug 2006 07:11:55 GMT
The mergeFactor is way to high. With this setup index merging will 
only take place after 1000 index segments have been created. That's 
also the reason why there are so many directories in the index folder. 
The default value of 10 is usually a good choice and should only be 
changed in rare cases.

Can you please try a re-index with a mergeFactor of 10 and if you 
still run into an out of memory error file a jira issue?



KÖLL Claus wrote:
> i made some performance tests with a repository that has about 2 Million differend files
(doc,xls, txt and ppt)
> i am very satisfied with the performace ...
> but now i made a test to re-index the whole repository to handle a scenario if there
are some problems with the index at run time.
> i have deleted the index folder an restart the repository
> my test pc configuration (windows 2003/4gb ram/150Gb hard disk) 
> i run always in a outofmemory exception while index creation at startup from the repository
> i have set the /3Gb flag into the boot.ini to get more inital heap size
> the current java start parameters are 
> -Xms1550m -Xmx3000m
> the workspace.xml file has these parameters
> <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>     <param name="path" value="${wsp.home}/index"/>
>     <param name="textFilterClasses"         value="org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter,org.apache.jackrabbit.core.query.MsExcelTextFilter,org.apache.jackrabbit.core.query.MsPowerPointTextFilter,org.apache.jackrabbit.core.query.MsWordTextFilter,org.apache.jackrabbit.core.query.PdfTextFilter,org.apache.jackrabbit.core.query.HTMLTextFilter,org.apache.jackrabbit.core.query.XMLTextFilter,org.apache.jackrabbit.core.query.RTFTextFilter,org.apache.jackrabbit.core.query.OpenOfficeTextFilter"/>
>     <param name="useCompoundFile" value="true" />
>     <param name="minMergeDocs" value="1000" />
>     <param name="mergeFactor" value="1000" />
>     <param name="cacheSize" value="1000"/>
>     <param name="respectDocumentOrder" value="false" />
>     <param name="autoRepair" value="true"/>
>     <param name="forceConsistencyCheck" value="false"/>
> </SearchIndex>
> for me its strange that during the index process lucene creates about 600 - 700 directories
under the  
> index folder in the workspace directory and the redo.log is about 25Mb and then i get
a outofmemoryexception.
> at the time of initial filling of the repository the merge of the index folders/files
works fine
> but now it seems that the merger does not work.
> if i restart the repository after the exception occurs the index folders/files will be
merged into about 20-30 folders but
> the repository is not indexed whole.
> thanks for help
> claus

View raw message