jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From KÖLL Claus <C.KO...@TIROL.GV.AT>
Subject Problems with re-index a huge repository
Date Wed, 09 Aug 2006 13:27:56 GMT
i made some performance tests with a repository that has about 2 Million differend files (doc,xls,
txt and ppt)
i am very satisfied with the performace ...
but now i made a test to re-index the whole repository to handle a scenario if there are some
problems with the index at run time.
i have deleted the index folder an restart the repository
 
my test pc configuration (windows 2003/4gb ram/150Gb hard disk) 
 
i run always in a outofmemory exception while index creation at startup from the repository
i have set the /3Gb flag into the boot.ini to get more inital heap size
 
the current java start parameters are 
-Xms1550m -Xmx3000m
the workspace.xml file has these parameters
<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
    <param name="path" value="${wsp.home}/index"/>
    <param name="textFilterClasses"         value="org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter,org.apache.jackrabbit.core.query.MsExcelTextFilter,org.apache.jackrabbit.core.query.MsPowerPointTextFilter,org.apache.jackrabbit.core.query.MsWordTextFilter,org.apache.jackrabbit.core.query.PdfTextFilter,org.apache.jackrabbit.core.query.HTMLTextFilter,org.apache.jackrabbit.core.query.XMLTextFilter,org.apache.jackrabbit.core.query.RTFTextFilter,org.apache.jackrabbit.core.query.OpenOfficeTextFilter"/>
    <param name="useCompoundFile" value="true" />
    <param name="minMergeDocs" value="1000" />
    <param name="mergeFactor" value="1000" />
    <param name="cacheSize" value="1000"/>
    <param name="respectDocumentOrder" value="false" />
    <param name="autoRepair" value="true"/>
    <param name="forceConsistencyCheck" value="false"/>
</SearchIndex>

for me its strange that during the index process lucene creates about 600 - 700 directories
under the  
index folder in the workspace directory and the redo.log is about 25Mb and then i get a outofmemoryexception.
at the time of initial filling of the repository the merge of the index folders/files works
fine
but now it seems that the merger does not work.

if i restart the repository after the exception occurs the index folders/files will be merged
into about 20-30 folders but
the repository is not indexed whole.

thanks for help

claus

 

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message