jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (JCR-550) ObservationManagerFactory) -
Date Fri, 03 Nov 2006 08:32:19 GMT
OutOfMemoryError when re-indexing the repository
In-Reply-To: <31703653.1156836982370.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

    [ http://issues.apache.org/jira/browse/JCR-550?page=comments#action_12446854 ] 
            
Jukka Zitting commented on JCR-550:
-----------------------------------

I would assume that the OutOfMemoryException is triggered by the parsing of some large Word
document, especially since you reported that the problem does not occur if you disable the
Word document filter.

Thus, if we catch the OutOfMemoryException caused by a single document, it will should not
interrupt the whole indexing process. Any memory garbage should then get collected automatically
unless the document parser stores information statically.

> ObservationManagerFactory) -
OutOfMemoryError when re-indexing the repository
> ------------------------------------------------------------------------------
>
>                 Key: JCR-550
>                 URL: http://issues.apache.org/jira/browse/JCR-550
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: indexing
>    Affects Versions: 1.0.1
>         Environment: tomcat 5.0 [256 up to 512 mb of ram] 
> jackrabbit 1.0.1 
> jdk 1.4.2_12 
> Intel Xeon 3.2GHz with 2Gb of memory
> ----
> poi-3.0-alpha2-20060616.jar
> poi-contrib-3.0-alpha2-20060616.jar
> poi-scratchpad-3.0-alpha2-20060616.jar
> jackrabbit-core-1.0.1.jar
> jackrabbit-index-filters-1.0.1.jar
> jackrabbit-jcr-commons-1.0.1.jar
> jcr-1.0.jar
> tm-extractors-0.4.jar
> lucene-1.4.3.jar
>            Reporter: Christian Zanata
>         Assigned To: Marcel Reutegger
>         Attachments: log_files.zip
>
>
> [ERROR] 20060825 17:06:40
> (org.apache.jackrabbit.core.observation.ObservationManagerFactory) -
> Synchronous EventConsumer threw exception. java.lang.OutOfMemoryError
> when we try to re-index a repository, the repository is quite big (more then 4 Gb of
disk usage) and sometimes it stores 40Mb size documents.
> As attach I put all the last logs we registered, with the full stack traces.
> Related to this whe have also errors with Lucene:
> [DEBUG] 20060803 08:24:01 (org.apache.jackrabbit.core.query.LazyReader)
> - Dump: 
> java.io.IOException: Invalid header signature; read 8656037701166316554,
> expected -2226271756974174256
>         at org.apache.jackrabbit.core.query.MsWordTextFilter
> and then this ones:
> [DEBUG] 20060803 08:37:17 (org.apache.jackrabbit.core.ItemManager) -
> removing item 8637bf5f-4689-4e75-888f-b7b89bef40c8 from cache
> [ WARN] 20060803 08:40:13 (org.apache.jackrabbit.core.RepositoryImpl) -
> Existing lock file at C:\Wave\Repository\.lock deteteced. Repository was
> not shut down properly.
> [ERROR] 20060803 09:33:14
> (org.apache.jackrabbit.core.observation.ObservationManagerFactory) -
> Synchronous EventConsumer threw exception.
> java.lang.NullPointerException: null values not allowed
> this is our repository.xml configuration for indexing
> <SearchIndex
> class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>         <param name="path" value="${wsp.home}/index"/>
>         <param name="textFilterClasses"
> value="org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter,
> org.apache.jackrabbit.core.query.MsExcelTextFilter,
> org.apache.jackrabbit.core.query.MsPowerPointTextFilter, 
> org.apache.jackrabbit.core.query.MsWordTextFilter,
> org.apache.jackrabbit.core.query.PdfTextFilter,
> org.apache.jackrabbit.core.query.HTMLTextFilter,
> org.apache.jackrabbit.core.query.XMLTextFilter,
> org.apache.jackrabbit.core.query.RTFTextFilter,
>                         org.apache.jackrabbit.core.query.OpenOfficeTextFilter"/>
>         <param name="useCompoundFile" value="true"/>
>         <param name="minMergeDocs" value="100"/>
>         <param name="volatileIdleTime" value="3"/>
>         <param name="maxMergeDocs" value="100000"/>
>         <param name="mergeFactor" value="10"/>
>         <param name="bufferSize" value="10"/>
>         <param name="cacheSize" value="1000"/>
>         <param name="forceConsistencyCheck" value="false"/>
>         <param name="autoRepair" value="true"/>
>                 <param name="respectDocumentOrder" value="false"/>
>         <param name="analyzer"
> value="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
> </SearchIndex>

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message