lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: extensive minor garbage collection when using RAMDirectory on Lucune 3.6.2
Date Tue, 15 Jan 2013 14:04:04 GMT
RAMDirectory generally has high GC cost for a large index because it always
allocates byte[1024] as its "pages".  See eg
http://blog.mikemccandless.com/2012/07/lucene-index-in-ram-with-azuls-zing-jvm.html

But, you are hitting lots of new gen garbage, which is different.

ExactPhraseScorer is new from 3.0.3 -> 3.6.2, and it does alloc two
int[4096] temp arrays, per segment, to do its work, and it gave a speedup
over the old ExactPhraseScorer.

BooleanScorer$Bucket existed in 3.0.3 as well, so I'm not sure why you
didn't see it then.  Though, it could be you previously got BooleanScorer2,
which is slower but creates fewer temp objects.  It creates an array of
2048 Bucket instances, each instance is maybe ~ 32 bytes, plus the pointer
so maybe ~40 bytes * 2048 per segment.

ScoreDoc is created to return the hits back after searching.  How many hits
do you pull for every search?

It's hard to see how this can be 5 MB per search.  But: how many segments
are in your index...?

Net/net I would not worry about this unless you see slower real-world
performance.  The GC seems to be doing its job (reaping lots of "died
young" objects) effectively ...

Mike McCandless

http://blog.mikemccandless.com


On Wed, Jan 9, 2013 at 12:34 PM, Alon Muchnick <alon@datonics.com> wrote:

> hi ,
> after upgrading to Lucune 3.6.2 i noticed there is an extensive minor
> garbage collection operations. once or twice a second , and the amount of
> memory being freed is about 600 MB each time for a load of 60 searches per
> second :
>
> 2013-01-09T18:57:24.350+0200: 174200.121: [GC [PSYoungGen:
> 630064K->544K(630336K)] 1405948K->776747K(1660992K), 0.0084250 secs]
> [Times: user=0.03 sys=0.00, real=0.01 secs]
> 2013-01-09T18:57:24.785+0200: 174200.555: [GC [PSYoungGen:
> 629920K->704K(630336K)] 1406123K->777083K(1660992K), 0.0066510 secs]
> [Times: user=0.04 sys=0.00, real=0.01 secs]
>
> after some investigation , the caused seemed to be Lucune related , we are
> using RAM directory (with about 70000 documents ) on top of tomcat 7.
>
> so i took Lucune source code and integrated it to our application , then
> run Jprofiler during some  load for a minute and checked the "application
> call tree " for "Garbage collected objects" only .
> i found that the 2 Lucune method that produce the most garbage collected
> object are (*screen shot attached* ) :
>
> (we have several indexes the below is from one of them )
>
>
>    - 23.2% - 215 MB - 48,252 alloc.
>    org.apache.lucene.search.ExactPhraseScorer.
>
>
>    - 6.3% - 60,481 kB - 728,414 alloc.
>    org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer
>
>
>
> looking at the currently live object i see the top 2 classes instance are
> ( a screen shot is also attached )  :
>
>  org.apache.lucene.search.BooleanScorer$Bucket (instance count ) 1311699  (size)
> 41974368
>
>  org.apache.lucene.search.ScoreDoc (instance count )45846  (size)1100304
>
> we have a system with an earlier version  (tomcat 5.5 and Lucune 3.0.3 our
> all other code / jars are the same) which handles the same load and has the
> same indexes and the minor gc there occurs once every 1-2 second and the
> amount of memory being freed is about 60 MB.
>
> does this behavior is normal ?
> does a search should really generate so much short lived objects ( about
> 5MB for a search ) ?
> should we try and use some other RAM bases Index ? ( i know
> InstantiatedIndex is deprecated )
>
> thanks in advance .
>
> Alon
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message