geode-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Murmann <amurm...@apache.org>
Subject Re: [Proposal] Change default gemfire.memoryEventTolerance from 1 to 0
Date Thu, 24 Jan 2019 00:13:09 GMT
Ryan, thank you so much for the great explanation of your proposal!

This seems very sound and you and David got me convinced that it's the
right thing to change the default. To me the question now is one of timing.
Is this something we can change in a minor release or do we have to wait
for Geode 2.0? I think we have quite a few defaults we'd like to update.
However, a user might have a system in prod that relies on defaults being a
certain way and upgrading to the next minor shouldn't require any work on
their end to prevent any negative impact on their system.

Thoughts?

On Tue, Jan 22, 2019 at 12:33 PM David Wisler <dwisler@pivotal.io> wrote:

> I would add that, by changing the default to 0, we can then skip all of the
> "special" logic that almost no customers use.    With a default of 1, we go
> into this logic every time unnecessarily, even when customers have not
> explicitly told us to "tolerate" an eviction or critical state change.    I
> am in favor of this default change to 0, and also add that there are no
> customers who would even realize such a change in behavior has occurred.
> I would also suggest that tolerating 1 critical reading, delaying the
> subsequent behaviors in GemFire when above critical, could make us more
> vulnerable to OOME's than would be the case by immediately transitioning
> state.
>
> My 2 cents.  Thanks for the email Ryan.
>
>
>
> On Tue, Jan 22, 2019 at 10:22 AM Ryan McMahon <rmcmahon@pivotal.io> wrote:
>
> > Hi all,
> >
> > I am currently fixing a bug
> > <https://issues.apache.org/jira/browse/GEODE-6304> with the
> > HeapMemoryMonitor event tolerance feature, and came across a decision
> that
> > I thought would be more appropriate for the Geode dev list.
> >
> > For those familiar with the feature, we are proposing that the default
> > gemfire.memoryEventTolerance config parameter value is changed from 1 to
> 0
> > so state transitions from normal to eviction or critical occur
> immediately
> > after reading a single heap-used-bytes event above threshold.  If you are
> > unfamiliar with the feature, read on.
> >
> > The memory event tolerance feature addresses issues with some JVM distros
> > that result in sporadic, erroneously high heap-bytes-used readings.  The
> > feature was introduced to address this issue in the JRockit JVM, but it
> has
> > been found that other JVM distros are susceptible to this problem as
> well.
> >
> > The feature prevents an "unexpected" state transition from a normal state
> > to an eviction or critical state by requiring N (configurable)
> consecutive
> > heap-used-byte events above threshold before changing states.  The
> current
> > default configuration is N = 5 for JRockit and N = 1 for all other JVMs.
> > In a non-JRockit JVM, this configuration permits a single event above
> > threshold WITHOUT causing a state transition.  In other words, by
> default,
> > we allow for a single bad outlier heap-used-bytes reading without going
> > into an eviction or critical state.
> >
> > As part of this bug fix (which involves a failure to reset the tolerance
> > counter under some conditions), we opted to remove the special handling
> for
> > JRockit because JRockit is no longer supported.  After removing the
> JRockit
> > handling, we started re-evaluating if a default value of 1 is appropriate
> > for all other JVMs.  We are considering changing the default to 0, so
> state
> > transitions would occur immediately if an event above the threshold is
> > received.  If a user is facing one of these problematic JVMs, they can
> then
> > change the gemfire.memoryEventTolerance config parameter to increase the
> > tolerance.  Our concern is that the default today is potentially masking
> > bad heap readings without the user ever knowing.
> >
> > To summarize, if we change the default from 1 to 0 it would potentially
> be
> > a change in behavior in that we would no longer be masking a single bad
> > heap-used-bytes reading i.e. no longer permitting a single outlier
> without
> > changing states.  The user can then decide whether to configure a
> non-zero
> > tolerance to address the situation.  Any thoughts on this change in
> > behavior?
> >
> > Thanks,
> > Ryan
> >
> >
> >
> >
> >
> >
> >
> >
>
> --
>
> David Wisler  |  GemFire Support Product Manager  |  503-810-7840 cell
> Support.Pivotal.io
> <
> http://www.google.com/url?q=http%3A%2F%2Fsupport.pivotal.io%2F&sa=D&sntz=1&usg=AFQjCNGDBr_XSKC18wot5h3OkKoZ84Vn7Q
> >
>   |  Mon-Fri  8:00am to 5:00pm PST  |  1-877-477-2269
> [image: support]
> <
> https://www.google.com/url?q=https%3A%2F%2Fsupport.pivotal.io%2F&sa=D&sntz=1&usg=AFQjCNEvwKLjzu29inKwy4jJjKsboqGMCg
> >
>  [image: twitter]
> <
> https://www.google.com/url?q=https%3A%2F%2Ftwitter.com%2Fpivotal&sa=D&sntz=1&usg=AFQjCNG1FcqkH5ghKsSG6UkdeUzjSuDSHg
> >
>  [image: linkedin]
> <
> https://www.google.com/url?q=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2F3048967&sa=D&sntz=1&usg=AFQjCNHOQGYmDYIQz06S3-vAuqzf8bN8Yw
> >
>  [image: facebook]
> <
> https://www.google.com/url?q=https%3A%2F%2Fwww.facebook.com%2Fpivotalsoftware&sa=D&sntz=1&usg=AFQjCNFQnPFtec1Rp3lKf6MuY1jcbA8j2A
> >
>  [image: google plus] <https://plus.google.com/+Pivotal> [image: youtube]
> <https://www.youtube.com/playlist?list=PLAdzTan_eSPScpj2J50ErtzR9ANSzv3kl>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message