jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Seidel. Robert" <Robert.Sei...@aeb.com>
Subject AW: Huge memory usage while re-indexing
Date Fri, 06 Sep 2013 07:05:00 GMT

keyword extraction of very large files will consume a lot of memory, cause all keywords have
to be kept in memory (I`m not sure, if this is a Lucene issue or how its been used). For this
you have three options:
- use all keywords, but live with the memory issue
- restrict the amount of keywords, but live with only half indexed files
- disable keyword extraction by using a index configuration for nt:resource where only a dummy
non existing property should be indexed
Imho the second is the worst solution because it is not reliable.

Second time, I`ve seen more memory consumption was when lucene index files were merged. But
I didn`t had the time to investigate here further, extending the memory a bit helped, so I
don`t know about the cause here.

Kind regards, Robert

-----Urspr√ľngliche Nachricht-----
Von: pgupta [mailto:pankaj.gupta@ansys.com] 
Gesendet: Freitag, 6. September 2013 05:36
An: users@jackrabbit.apache.org
Betreff: Re: Huge memory usage while re-indexing

Unfortunately not, as our users can potentially construct a search query using any property.

Do you think it's the number of indexable properties causing the memory issues? I was thinking
it was perhaps more to do with the keyword extraction from file contents. We came across somewhat
similar memory issue when we increased the number of words used for indexing from 10,000 to
a million.
This again caused huge memory spike (~ 2GB) while importing a large text file (~ 100 MB).
Because of this we had to revert this setting to the default value. 

So my initial thinking is that either Lucene indexing (or how it's being used by Jackrabbit)
is not scalable, or our configuration is not optimal to handle these cases.

View this message in context: http://jackrabbit.510166.n4.nabble.com/Huge-memory-usage-while-re-indexing-tp4659465p4659472.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.

View raw message