lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ariel <ionat...@gmail.com>
Subject Re: Solr and Garbage Collection
Date Fri, 25 Sep 2009 17:27:49 GMT
Ok. I will try with the "concurrent low pause" collector and let you know
the results.
On Fri, Sep 25, 2009 at 2:23 PM, Walter Underwood <wunder@wunderwood.org>wrote:

> As I said, I was using the IBM JVM, not the Sun JVM. The "concurrent low
> pause" collector is only in the Sun JVM.
>
> I just found this excellent article about the various IBM GC options for a
> Lucene application with a 100GB heap:
>
>
> http://www.nearinfinity.com/blogs/aaron_mccurry/tuning_the_ibm_jvm_for_large
> _h.html
>
> wunder
>
> -----Original Message-----
> From: Mark Miller [mailto:markrmiller@gmail.com]
> Sent: Friday, September 25, 2009 10:03 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr and Garbage Collection
>
> Walter Underwood wrote:
> > 30ms is not better or worse than 1s until you look at the service
> > requirements. For many applications, it is worth dedicating 10% of your
> > processing time to GC if that makes the worst-case pause short.
> >
> > On the other hand, my experience with the IBM JVM was that the maximum
> query
> > rate was 2-3X better with the concurrent generational GC compared to any
> of
> > their other GC algorithms, so we got the best throughput along with the
> > shortest pauses.
> >
> With which collector? Since the very early JVM's, all GC is generational.
> Most of the collectors (other than the Serial Collector) also work
> concurrently.
> By default, they are concurrent on different generations, but you can
> add concurrency
> to the "other" generation with each now too.
> > Solr garbage generation (for queries) seems to have two major components:
> > per-request garbage and cache evictions. With a generational collector,
> > these two are handled by separate parts of the collector.
> Different parts of the collector? Its a different collector depending on
> the generation.
> The young generation is collected with a copy collector. This is because
> almost all the objects
> in the young generation are likely dead, and a copy collector only needs
> to visit live objects. So
> its very efficient. The tenured generation uses something more along the
> lines of mark and sweep or mark
> and compact.
> >  Per-request
> > garbage should completely fit in the short-term heap (nursery), so that
> it
> > can be collected rapidly and returned to use for further requests. If the
> > nursery is too small, the per-request allocations will be made in tenured
> > space and sit there until the next major GC. Cache evictions are almost
> > always in long-term storage (tenured space) because an LRU algorithm
> > guarantees that the garbage will be old.
> >
> > Check the growth rate of tenured space (under constant load, of course)
> > while increasing the size of the nursery. That rate should drop when the
> > nursery gets big enough, then not drop much further as it is increased
> more.
> >
> > After that, reduce the size of tenured space until major GCs start
> happening
> > "too often" (a judgment call). A bigger tenured space means longer major
> GCs
> > and thus longer pauses, so you don't want it oversized by too much.
> >
> With the concurrent low pause collector, the goal is to avoid "major"
> collections,
> by collecting *before* the tenured space is filled. If you you are
> getting "major" collections,
> you need to tune your settings - the whole point of that collector is to
> avoid "major"
> collections, and do almost all of the work while your application is not
> paused. There are
> still 2 brief pauses during the collection, but they should not be
> significant at all.
> > Also check the hit rates of your caches. If the hit rate is low, say 20%
> or
> > less, make that cache much bigger or set it to zero. Either one will
> reduce
> > the number of cache evictions. If you have an HTTP cache in front of
> Solr,
> > zero may be the right choice, since the HTTP cache is cherry-picking the
> > easily cacheable requests.
> >
> > Note that a commit nearly doubles the memory required, because you have
> two
> > live Searcher objects with all their caches. Make sure you have headroom
> for
> > a commit.
> >
> > If you want to test the tenured space usage, you must test with real
> world
> > queries. Those are the only way to get accurate cache eviction rates.
> >
> > wunder
> >
> > -----Original Message-----
> > From: Jonathan Ariel [mailto:ionathan@gmail.com]
> > Sent: Friday, September 25, 2009 9:34 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Solr and Garbage Collection
> >
> > BTW why making them equal will lower the frequency of GC?
> >
> > On 9/25/09, Fuad Efendi <fuad@efendi.ca> wrote:
> >
> >>> Bigger heaps lead to bigger GC pauses in general.
> >>>
> >> Opposite viewpoint:
> >> 1sec GC happening once an hour is MUCH BETTER than 30ms GC
> >>
> > once-per-second.
> >
> >> To lower frequency of GC: -Xms4096m -Xmx4096m (make it equal!)
> >>
> >> Use -server option.
> >>
> >> -server option of JVM is 'native CPU code', I remember WebLogic 7
> console
> >> with SUN JVM 1.3 not showing any GC (just horizontal line).
> >>
> >> -Fuad
> >> http://www.linkedin.com/in/liferay
> >>
> >>
> >>
> >>
> >>
> >
> >
> >
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message