Is it possible for you to index only certain properties of a node? Only those that are relevant to your search, for example.

http://wiki.apache.org/jackrabbit/IndexingConfiguration





On Thu, Sep 5, 2013 at 10:39 AM, pgupta <pankaj.gupta@ansys.com> wrote:
Hi,

We have a moderate sized repository with roughly the following size:
* Around 1M total objects
* Around 100K documents (PDFs, office docs, text, xml etc)
* Around 3TB of data in datastore (majority of which are non-indexable
binary files)

Recently we had to re-index the repository as the search index got out of
sync with the rest of the data. During that we encountered out-of-memory
issue several times. We had to increase the heap size to 64GB before the
re-indexing finally finished. The total RAM taken up by the Java process
during re-indexing steadily climbed to 60GB and stayed there till the
indexing finished.

We are using pretty standard search configuration as shown below:

    <SearchIndex
class="org.apache.jackrabbit.core.query.lucene.SearchIndex">




    </SearchIndex>

We tried playing with a few configuration settings such as
extractorPoolSize, maxMergeDocs etc without any appreciable impact on RAM
usage.

Some questions that we have are:
1) Is this high memory usage expected during indexing?
2) Can we make any configuration change to manage it?
3) Are there any improvements expected in Jackrabbit 3 (Project Oak)?

Thanks,
Pankaj





--
View this message in context: http://jackrabbit.510166.n4.nabble.com/Huge-memory-usage-while-re-indexing-tp4659465.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.



--
Cody Burleson
Enterprise Web Architect, Base22
Mobile: +1 (214) 537-8782
Skype: codyburleson
Email: cody@base22.com
Blog: codyburleson.com



Check my free/busy time.