activemq-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Bain <tb...@alumni.duke.edu>
Subject Re: Storing message off heap, message compression, and storing the whole message as headers.
Date Mon, 20 Apr 2015 15:37:11 GMT
With G1GC, you need exactly 1 free bucket per GC thread to be able to
perform garbage collection.  By the time you need the next bucket on any
given thread, you've freed up at least one new one (often more), so you've
always got enough space for that next bucket.  And given that G1GC shoots
(by default) for 2000 buckets, you only need 0.05% of your heap free per
bucket in order to perform GC; even if you're using 8 threads, that's still
less than one half of one percent of your total heap, not 50%.

CMS doesn't perform compaction (till heap fragmentation forces a
ParallelGC-based full GC, anyway), and I believe that ParallelGC compacts
by copying live objects into the space previously used by dead objects (at
earlier array indices, to use that mental model), not a from-space/to-space
algorithm.

The bigger reason to have some overhead with G1GC is to allow the garbage
collector to successfully harvest objects from Old Gen fast enough to
ensure that you don't run out of memory (which would force a stop-the-world
full garbage collection); you want to make sure that there's enough
overhead when you start the first Old Gen GC to hold any objects that might
be created in New Gen before the GC finishes.  But I wouldn't describe that
as related to efficient compaction, and I'd be astounded (and alarmed) if
an application could generate 50% of the heap in new objects during a G1GC
pause (by default, that's 200ms).

Tim

On Mon, Apr 20, 2015 at 7:34 AM, Justin Reock <Justin.Reock@roguewave.com>
wrote:

> "You made a statement that sounds like "the JVM
> can only use half its memory, because the other half has to be kept free
> for GCing", which doesn't match my experience at all.  I've observed G1GC
> to successfully GC when the heap was nearly 100% full, I'm certain it's not
> a problem for CMS because CMS is a non-compacting Old Gen GC strategy -
> that's why it's subject to fragmentation - and I believe that ParallelGC
> does in-place compaction so it wouldn't require additional memory though I
> haven't directly observed it during a GC.²
>
> In general, because ActiveMQ makes use of so many tiny objects in memory,
> I recommend people set aside around twice the necessary heap to allow
> compaction to occur efficiently, even with G1GC.  The compaction algorithm
> works by copying fragmented data into a free contiguous area of heap, so,
> the more free heap you have, the higher the guarantee that lots of objects
> can be compacted without requiring a GC.
>
> -Justin
>
> On 4/20/15, 9:24 AM, "Tim Bain" <tbain@alumni.duke.edu> wrote:
>
> >I'm confused about what would drive the need for this.
> >
> >Is it the ability to hold more messages than your JVM size allows?  If so,
> >we already have both KahaDB and LevelDB; what does Chronicle offer that
> >those other two don't?
> >
> >Is it because you see some kind of inefficiency in how ActiveMQ uses
> >memory
> >or how the JVM's GC strategies work?  If so, can you elaborate on what
> >you're concerned about?  (You made a statement that sounds like "the JVM
> >can only use half its memory, because the other half has to be kept free
> >for GCing", which doesn't match my experience at all.  I've observed G1GC
> >to successfully GC when the heap was nearly 100% full, I'm certain it's
> >not
> >a problem for CMS because CMS is a non-compacting Old Gen GC strategy -
> >that's why it's subject to fragmentation - and I believe that ParallelGC
> >does in-place compaction so it wouldn't require additional memory though I
> >haven't directly observed it during a GC.  Please either correct my
> >interpretation of what your statement or provide the data you're basing it
> >on.)
> >
> >One difference in GC behavior with what you're proposing is that under
> >your
> >algorithm you'd GC each message at least twice (once when it's received
> >and
> >put into Chronicle, and once when it's pulled from Chronicle and sent
> >onward, plus any additional reads needed to operate on the message such as
> >if a new subscriber with a non-matching selector connected to the broker)
> >instead of just once under the current algorithm.  On the other hand, your
> >GCs should all be from Young Gen (and cheap) whereas the current algorithm
> >would likely push many of its messages to Old Gen.  Old Gen GCs are more
> >expensive under ParallelGC, though they're no worse under G1GC and CMS.
> >So
> >it's a trade-off under ParallelGC (maybe better, maybe worse) and a loss
> >under the other two.
> >
> >One other thing: this would give compression at rest, but not in motion,
> >and it comes at the expense of two serialization/deserialization and
> >compression/decompression operations per broker traversed.  Maybe being
> >able to store more messages in a given amount of memory is worth it to you
> >(your volumes seem a lot higher than ours, and than most installations'),
> >but latency and throughput matter more to us than memory usage so we'd
> >live
> >with using more memory to avoid the extra operations.
> >
> >The question about why to use message bodies at all is an interesting one,
> >though the ability to compress the body once and have it stay compressed
> >through multiple network writes is a compelling reason in the near term.
> >
> >Tim
> >On Apr 19, 2015 6:06 PM, "Kevin Burton" <burton@spinn3r.com> wrote:
> >
> >> I¹ve been thinking about how messages are stored in the broker and ways
> >>to
> >> improve the storage in memory.
> >>
> >> First, right now, messages are stored in the same heap, and if you¹re
> >>using
> >> the memory store, like, that¹s going to add up.  This will increase GC
> >> latency , and you actually need 2x more memory because you have to have
> >> temp memory set aside for GCs.
> >>
> >> I was thinking about using Chronicle to store the messages off heap
> >>using
> >> direct buffers.  The downside to this is that the messages need to be
> >> serialized/deserialized with each access. But realistically that¹s
> >>probably
> >> acceptable because you can do something like 1M message deserializations
> >> per second.  Which is normally more than the throughput of the broker.
> >>
> >> Additionally, chronicle supports zlib or snappy compression on the
> >>message
> >> bodies.  So, while the broker supports message compression now, it
> >>doesn¹t
> >> support this feature on headers.
> >>
> >> This would give us header compression!
> >>
> >> The broker would transparently decompress the headers when reading the
> >> message.
> >>
> >> This then begs the question, why use message bodies at all?  Why not
> >>just
> >> store an entire message as a set of headers?
> >>
> >> If you need hierarchy you can do foo.bar.cat.dog style header names.
> >>
> >>
> >> --
> >>
> >> Founder/CEO Spinn3r.com
> >> Location: *San Francisco, CA*
> >> blog: http://burtonator.wordpress.com
> >> Š or check out my Google+ profile
> >> <https://plus.google.com/102718274791889610666/posts>
> >> <http://spinn3r.com>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message