geode-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan McMahon <rmcma...@pivotal.io>
Subject Re: [Proposal] Change default gemfire.memoryEventTolerance from 1 to 0
Date Thu, 24 Jan 2019 17:57:59 GMT
Hi Alexander,

Thanks for bring up those concerns.  I think in this case the default of 1
is potentially masking behavior that the user would definitely want to see
and react to (such as heap memory read outliers/spikes), so it is a bit
different than other cases where changing the default would be a
nice-to-have in 2.0.  The user can still adjust their parameter to deal
with any bad reads they are observing, but it would be best if they did
that explicitly instead of us hiding the behavior by default silently.  So
I would argue it is a fair change in this case in a minor release, as it
could be viewed as a "bug fix" i.e. masking the bad reads by default.

Ryan

On Thu, Jan 24, 2019 at 9:49 AM David Wisler <dwisler@pivotal.io> wrote:

> I talked to him.  He is fine with our new change.
>
> :-)    He just wanted to have it on the table.
>
> So you missed out on this 1 minute discussion.  haha
>
> On Thu, Jan 24, 2019 at 9:38 AM Ryan McMahon <rmcmahon@pivotal.io> wrote:
>
>> Hey David,
>>
>> Alexander responded to the dev list, but not sure you got a copy.
>> Basically he is asking if we can wait to change the default tolerance until
>> Geode 2.0, since it is a behavior change and apparently there are several
>> config parameters for which we'd like to change the default, but we pushed
>> it to 2.0.  I don't feel super strongly either way.  Do you want to reply
>> to this thread or talk to Alexander directly?
>>
>> Ryan
>>
>> ---------- Forwarded message ---------
>> From: Alexander Murmann <amurmann@apache.org>
>> Date: Wed, Jan 23, 2019 at 4:13 PM
>> Subject: Re: [Proposal] Change default gemfire.memoryEventTolerance from
>> 1 to 0
>> To: <dev@geode.apache.org>
>> Cc: Ryan McMahon <rmcmahon@pivotal.io>
>>
>>
>> Ryan, thank you so much for the great explanation of your proposal!
>>
>> This seems very sound and you and David got me convinced that it's the
>> right thing to change the default. To me the question now is one of
>> timing.
>> Is this something we can change in a minor release or do we have to wait
>> for Geode 2.0? I think we have quite a few defaults we'd like to update.
>> However, a user might have a system in prod that relies on defaults being
>> a
>> certain way and upgrading to the next minor shouldn't require any work on
>> their end to prevent any negative impact on their system.
>>
>> Thoughts?
>>
>> On Tue, Jan 22, 2019 at 12:33 PM David Wisler <dwisler@pivotal.io> wrote:
>>
>> > I would add that, by changing the default to 0, we can then skip all of
>> the
>> > "special" logic that almost no customers use.    With a default of 1,
>> we go
>> > into this logic every time unnecessarily, even when customers have not
>> > explicitly told us to "tolerate" an eviction or critical state change.
>>   I
>> > am in favor of this default change to 0, and also add that there are no
>> > customers who would even realize such a change in behavior has occurred.
>> > I would also suggest that tolerating 1 critical reading, delaying the
>> > subsequent behaviors in GemFire when above critical, could make us more
>> > vulnerable to OOME's than would be the case by immediately transitioning
>> > state.
>> >
>> > My 2 cents.  Thanks for the email Ryan.
>> >
>> >
>> >
>> > On Tue, Jan 22, 2019 at 10:22 AM Ryan McMahon <rmcmahon@pivotal.io>
>> wrote:
>> >
>> > > Hi all,
>> > >
>> > > I am currently fixing a bug
>> > > <https://issues.apache.org/jira/browse/GEODE-6304> with the
>> > > HeapMemoryMonitor event tolerance feature, and came across a decision
>> > that
>> > > I thought would be more appropriate for the Geode dev list.
>> > >
>> > > For those familiar with the feature, we are proposing that the default
>> > > gemfire.memoryEventTolerance config parameter value is changed from 1
>> to
>> > 0
>> > > so state transitions from normal to eviction or critical occur
>> > immediately
>> > > after reading a single heap-used-bytes event above threshold.  If you
>> are
>> > > unfamiliar with the feature, read on.
>> > >
>> > > The memory event tolerance feature addresses issues with some JVM
>> distros
>> > > that result in sporadic, erroneously high heap-bytes-used readings.
>> The
>> > > feature was introduced to address this issue in the JRockit JVM, but
>> it
>> > has
>> > > been found that other JVM distros are susceptible to this problem as
>> > well.
>> > >
>> > > The feature prevents an "unexpected" state transition from a normal
>> state
>> > > to an eviction or critical state by requiring N (configurable)
>> > consecutive
>> > > heap-used-byte events above threshold before changing states.  The
>> > current
>> > > default configuration is N = 5 for JRockit and N = 1 for all other
>> JVMs.
>> > > In a non-JRockit JVM, this configuration permits a single event above
>> > > threshold WITHOUT causing a state transition.  In other words, by
>> > default,
>> > > we allow for a single bad outlier heap-used-bytes reading without
>> going
>> > > into an eviction or critical state.
>> > >
>> > > As part of this bug fix (which involves a failure to reset the
>> tolerance
>> > > counter under some conditions), we opted to remove the special
>> handling
>> > for
>> > > JRockit because JRockit is no longer supported.  After removing the
>> > JRockit
>> > > handling, we started re-evaluating if a default value of 1 is
>> appropriate
>> > > for all other JVMs.  We are considering changing the default to 0, so
>> > state
>> > > transitions would occur immediately if an event above the threshold is
>> > > received.  If a user is facing one of these problematic JVMs, they can
>> > then
>> > > change the gemfire.memoryEventTolerance config parameter to increase
>> the
>> > > tolerance.  Our concern is that the default today is potentially
>> masking
>> > > bad heap readings without the user ever knowing.
>> > >
>> > > To summarize, if we change the default from 1 to 0 it would
>> potentially
>> > be
>> > > a change in behavior in that we would no longer be masking a single
>> bad
>> > > heap-used-bytes reading i.e. no longer permitting a single outlier
>> > without
>> > > changing states.  The user can then decide whether to configure a
>> > non-zero
>> > > tolerance to address the situation.  Any thoughts on this change in
>> > > behavior?
>> > >
>> > > Thanks,
>> > > Ryan
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> >
>> > --
>> >
>> > David Wisler  |  GemFire Support Product Manager  |  503-810-7840 cell
>> > Support.Pivotal.io
>> > <
>> >
>> http://www.google.com/url?q=http%3A%2F%2Fsupport.pivotal.io%2F&sa=D&sntz=1&usg=AFQjCNGDBr_XSKC18wot5h3OkKoZ84Vn7Q
>> > >
>> >   |  Mon-Fri  8:00am to 5:00pm PST  |  1-877-477-2269
>> > [image: support]
>> > <
>> >
>> https://www.google.com/url?q=https%3A%2F%2Fsupport.pivotal.io%2F&sa=D&sntz=1&usg=AFQjCNEvwKLjzu29inKwy4jJjKsboqGMCg
>> > >
>> >  [image: twitter]
>> > <
>> >
>> https://www.google.com/url?q=https%3A%2F%2Ftwitter.com%2Fpivotal&sa=D&sntz=1&usg=AFQjCNG1FcqkH5ghKsSG6UkdeUzjSuDSHg
>> > >
>> >  [image: linkedin]
>> > <
>> >
>> https://www.google.com/url?q=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2F3048967&sa=D&sntz=1&usg=AFQjCNHOQGYmDYIQz06S3-vAuqzf8bN8Yw
>> > >
>> >  [image: facebook]
>> > <
>> >
>> https://www.google.com/url?q=https%3A%2F%2Fwww.facebook.com%2Fpivotalsoftware&sa=D&sntz=1&usg=AFQjCNFQnPFtec1Rp3lKf6MuY1jcbA8j2A
>> > >
>> >  [image: google plus] <https://plus.google.com/+Pivotal> [image:
>> youtube]
>> > <
>> https://www.youtube.com/playlist?list=PLAdzTan_eSPScpj2J50ErtzR9ANSzv3kl>
>> >
>>
>
>
> --
>
> David Wisler  |  GemFire Support Product Manager  |  503-810-7840 cell
> Support.Pivotal.io
> <http://www.google.com/url?q=http%3A%2F%2Fsupport.pivotal.io%2F&sa=D&sntz=1&usg=AFQjCNGDBr_XSKC18wot5h3OkKoZ84Vn7Q>
>   |  Mon-Fri  8:00am to 5:00pm PST  |  1-877-477-2269
> [image: support]
> <https://www.google.com/url?q=https%3A%2F%2Fsupport.pivotal.io%2F&sa=D&sntz=1&usg=AFQjCNEvwKLjzu29inKwy4jJjKsboqGMCg>
>  [image: twitter]
> <https://www.google.com/url?q=https%3A%2F%2Ftwitter.com%2Fpivotal&sa=D&sntz=1&usg=AFQjCNG1FcqkH5ghKsSG6UkdeUzjSuDSHg>
>  [image: linkedin]
> <https://www.google.com/url?q=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2F3048967&sa=D&sntz=1&usg=AFQjCNHOQGYmDYIQz06S3-vAuqzf8bN8Yw>
>  [image: facebook]
> <https://www.google.com/url?q=https%3A%2F%2Fwww.facebook.com%2Fpivotalsoftware&sa=D&sntz=1&usg=AFQjCNFQnPFtec1Rp3lKf6MuY1jcbA8j2A>
>  [image: google plus] <https://plus.google.com/+Pivotal> [image: youtube]
> <https://www.youtube.com/playlist?list=PLAdzTan_eSPScpj2J50ErtzR9ANSzv3kl>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message