jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcel Reutegger (JIRA)" <j...@apache.org>
Subject [jira] Commented: (JCR-550) ObservationManagerFactory) -
Date Thu, 31 Aug 2006 19:57:25 GMT
OutOfMemoryError when re-indexing the repository
In-Reply-To: <31703653.1156836982370.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

    [ http://issues.apache.org/jira/browse/JCR-550?page=comments#action_12431955 ] 
            
Marcel Reutegger commented on JCR-550:
--------------------------------------

To reproduce this issue I tried to re-index a repository with 100'000 nodes. I was able to
re-index the repository with as little as 32 mb heap size. My profiler did not show any exceptional
memory usage in the search index. The memory usage was actually quite low.

Can you please try to re-index your repository without the text filters? Maybe there is a
memory leak in one of the filters when an exception is thrown on an invalid or corrupt document.

Having a heap dump for analysis would also be helpful. Can you please run the re-indexing
process with the following JVM option: -Xrunhprof:heap=sites,doe=n
This will allow you to create a heap dump on a Ctrl-Break (on Windows) or kill -QUIT (on Unix)
on the JVM process.

Thanks a lot.

> ObservationManagerFactory) -
OutOfMemoryError when re-indexing the repository
> ------------------------------------------------------------------------------
>
>                 Key: JCR-550
>                 URL: http://issues.apache.org/jira/browse/JCR-550
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: indexing
>    Affects Versions: 1.0.1
>         Environment: tomcat 5.0 [256 up to 512 mb of ram] 
> jackrabbit 1.0.1 
> jdk 1.4.2_12 
> Intel Xeon 3.2GHz with 2Gb of memory
> ----
> poi-3.0-alpha2-20060616.jar
> poi-contrib-3.0-alpha2-20060616.jar
> poi-scratchpad-3.0-alpha2-20060616.jar
> jackrabbit-core-1.0.1.jar
> jackrabbit-index-filters-1.0.1.jar
> jackrabbit-jcr-commons-1.0.1.jar
> jcr-1.0.jar
> tm-extractors-0.4.jar
> lucene-1.4.3.jar
>            Reporter: Christian Zanata
>         Assigned To: Marcel Reutegger
>         Attachments: log_files.zip
>
>
> [ERROR] 20060825 17:06:40
> (org.apache.jackrabbit.core.observation.ObservationManagerFactory) -
> Synchronous EventConsumer threw exception. java.lang.OutOfMemoryError
> when we try to re-index a repository, the repository is quite big (more then 4 Gb of
disk usage) and sometimes it stores 40Mb size documents.
> As attach I put all the last logs we registered, with the full stack traces.
> Related to this whe have also errors with Lucene:
> [DEBUG] 20060803 08:24:01 (org.apache.jackrabbit.core.query.LazyReader)
> - Dump: 
> java.io.IOException: Invalid header signature; read 8656037701166316554,
> expected -2226271756974174256
>         at org.apache.jackrabbit.core.query.MsWordTextFilter
> and then this ones:
> [DEBUG] 20060803 08:37:17 (org.apache.jackrabbit.core.ItemManager) -
> removing item 8637bf5f-4689-4e75-888f-b7b89bef40c8 from cache
> [ WARN] 20060803 08:40:13 (org.apache.jackrabbit.core.RepositoryImpl) -
> Existing lock file at C:\Wave\Repository\.lock deteteced. Repository was
> not shut down properly.
> [ERROR] 20060803 09:33:14
> (org.apache.jackrabbit.core.observation.ObservationManagerFactory) -
> Synchronous EventConsumer threw exception.
> java.lang.NullPointerException: null values not allowed
> this is our repository.xml configuration for indexing
> <SearchIndex
> class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>         <param name="path" value="${wsp.home}/index"/>
>         <param name="textFilterClasses"
> value="org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter,
> org.apache.jackrabbit.core.query.MsExcelTextFilter,
> org.apache.jackrabbit.core.query.MsPowerPointTextFilter, 
> org.apache.jackrabbit.core.query.MsWordTextFilter,
> org.apache.jackrabbit.core.query.PdfTextFilter,
> org.apache.jackrabbit.core.query.HTMLTextFilter,
> org.apache.jackrabbit.core.query.XMLTextFilter,
> org.apache.jackrabbit.core.query.RTFTextFilter,
>                         org.apache.jackrabbit.core.query.OpenOfficeTextFilter"/>
>         <param name="useCompoundFile" value="true"/>
>         <param name="minMergeDocs" value="100"/>
>         <param name="volatileIdleTime" value="3"/>
>         <param name="maxMergeDocs" value="100000"/>
>         <param name="mergeFactor" value="10"/>
>         <param name="bufferSize" value="10"/>
>         <param name="cacheSize" value="1000"/>
>         <param name="forceConsistencyCheck" value="false"/>
>         <param name="autoRepair" value="true"/>
>                 <param name="respectDocumentOrder" value="false"/>
>         <param name="analyzer"
> value="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
> </SearchIndex>

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message