jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cody Burleson <cody.burle...@base22.com>
Subject Re: Huge memory usage while re-indexing
Date Fri, 06 Sep 2013 02:24:41 GMT
Is it possible for you to index only certain properties of a node? Only
those that are relevant to your search, for example.

http://wiki.apache.org/jackrabbit/IndexingConfiguration





On Thu, Sep 5, 2013 at 10:39 AM, pgupta <pankaj.gupta@ansys.com> wrote:

> Hi,
>
> We have a moderate sized repository with roughly the following size:
> * Around 1M total objects
> * Around 100K documents (PDFs, office docs, text, xml etc)
> * Around 3TB of data in datastore (majority of which are non-indexable
> binary files)
>
> Recently we had to re-index the repository as the search index got out of
> sync with the rest of the data. During that we encountered out-of-memory
> issue several times. We had to increase the heap size to 64GB before the
> re-indexing finally finished. The total RAM taken up by the Java process
> during re-indexing steadily climbed to 60GB and stayed there till the
> indexing finished.
>
> We are using pretty standard search configuration as shown below:
>
>     <SearchIndex
> class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>
>
>
>
>     </SearchIndex>
>
> We tried playing with a few configuration settings such as
> extractorPoolSize, maxMergeDocs etc without any appreciable impact on RAM
> usage.
>
> Some questions that we have are:
> 1) Is this high memory usage expected during indexing?
> 2) Can we make any configuration change to manage it?
> 3) Are there any improvements expected in Jackrabbit 3 (Project Oak)?
>
> Thanks,
> Pankaj
>
>
>
>
>
> --
> View this message in context:
> http://jackrabbit.510166.n4.nabble.com/Huge-memory-usage-while-re-indexing-tp4659465.html
> Sent from the Jackrabbit - Users mailing list archive at Nabble.com.
>



-- 
Cody Burleson
Enterprise Web Architect, Base22
Mobile: +1 (214) 537-8782
Skype: codyburleson
Email: cody@base22.com
Blog: codyburleson.com

* <http://base22.com>*
*
*
*Check my free/busy
time.<http://www.google.com/calendar/embed?src=cody.burleson%40base22.com&ctz=America/Chicago%20>
*

Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message