geode-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Murmann <amurm...@apache.org>
Subject Re: [Proposal] Change default gemfire.memoryEventTolerance from 1 to 0
Date Thu, 24 Jan 2019 20:54:00 GMT
Great points, Ryan. I agree, if this is a bug fix rather than a
feature/improvement we should fix it.

On Thu, Jan 24, 2019 at 9:58 AM Ryan McMahon <rmcmahon@pivotal.io> wrote:

> Hi Alexander,
>
> Thanks for bring up those concerns.  I think in this case the default of 1
> is potentially masking behavior that the user would definitely want to see
> and react to (such as heap memory read outliers/spikes), so it is a bit
> different than other cases where changing the default would be a
> nice-to-have in 2.0.  The user can still adjust their parameter to deal
> with any bad reads they are observing, but it would be best if they did
> that explicitly instead of us hiding the behavior by default silently.  So
> I would argue it is a fair change in this case in a minor release, as it
> could be viewed as a "bug fix" i.e. masking the bad reads by default.
>
> Ryan
>
> On Thu, Jan 24, 2019 at 9:49 AM David Wisler <dwisler@pivotal.io> wrote:
>
> > I talked to him.  He is fine with our new change.
> >
> > :-)    He just wanted to have it on the table.
> >
> > So you missed out on this 1 minute discussion.  haha
> >
> > On Thu, Jan 24, 2019 at 9:38 AM Ryan McMahon <rmcmahon@pivotal.io>
> wrote:
> >
> >> Hey David,
> >>
> >> Alexander responded to the dev list, but not sure you got a copy.
> >> Basically he is asking if we can wait to change the default tolerance
> until
> >> Geode 2.0, since it is a behavior change and apparently there are
> several
> >> config parameters for which we'd like to change the default, but we
> pushed
> >> it to 2.0.  I don't feel super strongly either way.  Do you want to
> reply
> >> to this thread or talk to Alexander directly?
> >>
> >> Ryan
> >>
> >> ---------- Forwarded message ---------
> >> From: Alexander Murmann <amurmann@apache.org>
> >> Date: Wed, Jan 23, 2019 at 4:13 PM
> >> Subject: Re: [Proposal] Change default gemfire.memoryEventTolerance from
> >> 1 to 0
> >> To: <dev@geode.apache.org>
> >> Cc: Ryan McMahon <rmcmahon@pivotal.io>
> >>
> >>
> >> Ryan, thank you so much for the great explanation of your proposal!
> >>
> >> This seems very sound and you and David got me convinced that it's the
> >> right thing to change the default. To me the question now is one of
> >> timing.
> >> Is this something we can change in a minor release or do we have to wait
> >> for Geode 2.0? I think we have quite a few defaults we'd like to update.
> >> However, a user might have a system in prod that relies on defaults
> being
> >> a
> >> certain way and upgrading to the next minor shouldn't require any work
> on
> >> their end to prevent any negative impact on their system.
> >>
> >> Thoughts?
> >>
> >> On Tue, Jan 22, 2019 at 12:33 PM David Wisler <dwisler@pivotal.io>
> wrote:
> >>
> >> > I would add that, by changing the default to 0, we can then skip all
> of
> >> the
> >> > "special" logic that almost no customers use.    With a default of 1,
> >> we go
> >> > into this logic every time unnecessarily, even when customers have not
> >> > explicitly told us to "tolerate" an eviction or critical state change.
> >>   I
> >> > am in favor of this default change to 0, and also add that there are
> no
> >> > customers who would even realize such a change in behavior has
> occurred.
> >> > I would also suggest that tolerating 1 critical reading, delaying the
> >> > subsequent behaviors in GemFire when above critical, could make us
> more
> >> > vulnerable to OOME's than would be the case by immediately
> transitioning
> >> > state.
> >> >
> >> > My 2 cents.  Thanks for the email Ryan.
> >> >
> >> >
> >> >
> >> > On Tue, Jan 22, 2019 at 10:22 AM Ryan McMahon <rmcmahon@pivotal.io>
> >> wrote:
> >> >
> >> > > Hi all,
> >> > >
> >> > > I am currently fixing a bug
> >> > > <https://issues.apache.org/jira/browse/GEODE-6304> with the
> >> > > HeapMemoryMonitor event tolerance feature, and came across a
> decision
> >> > that
> >> > > I thought would be more appropriate for the Geode dev list.
> >> > >
> >> > > For those familiar with the feature, we are proposing that the
> default
> >> > > gemfire.memoryEventTolerance config parameter value is changed from
> 1
> >> to
> >> > 0
> >> > > so state transitions from normal to eviction or critical occur
> >> > immediately
> >> > > after reading a single heap-used-bytes event above threshold.  If
> you
> >> are
> >> > > unfamiliar with the feature, read on.
> >> > >
> >> > > The memory event tolerance feature addresses issues with some JVM
> >> distros
> >> > > that result in sporadic, erroneously high heap-bytes-used readings.
> >> The
> >> > > feature was introduced to address this issue in the JRockit JVM, but
> >> it
> >> > has
> >> > > been found that other JVM distros are susceptible to this problem
as
> >> > well.
> >> > >
> >> > > The feature prevents an "unexpected" state transition from a normal
> >> state
> >> > > to an eviction or critical state by requiring N (configurable)
> >> > consecutive
> >> > > heap-used-byte events above threshold before changing states.  The
> >> > current
> >> > > default configuration is N = 5 for JRockit and N = 1 for all other
> >> JVMs.
> >> > > In a non-JRockit JVM, this configuration permits a single event
> above
> >> > > threshold WITHOUT causing a state transition.  In other words, by
> >> > default,
> >> > > we allow for a single bad outlier heap-used-bytes reading without
> >> going
> >> > > into an eviction or critical state.
> >> > >
> >> > > As part of this bug fix (which involves a failure to reset the
> >> tolerance
> >> > > counter under some conditions), we opted to remove the special
> >> handling
> >> > for
> >> > > JRockit because JRockit is no longer supported.  After removing the
> >> > JRockit
> >> > > handling, we started re-evaluating if a default value of 1 is
> >> appropriate
> >> > > for all other JVMs.  We are considering changing the default to 0,
> so
> >> > state
> >> > > transitions would occur immediately if an event above the threshold
> is
> >> > > received.  If a user is facing one of these problematic JVMs, they
> can
> >> > then
> >> > > change the gemfire.memoryEventTolerance config parameter to increase
> >> the
> >> > > tolerance.  Our concern is that the default today is potentially
> >> masking
> >> > > bad heap readings without the user ever knowing.
> >> > >
> >> > > To summarize, if we change the default from 1 to 0 it would
> >> potentially
> >> > be
> >> > > a change in behavior in that we would no longer be masking a single
> >> bad
> >> > > heap-used-bytes reading i.e. no longer permitting a single outlier
> >> > without
> >> > > changing states.  The user can then decide whether to configure a
> >> > non-zero
> >> > > tolerance to address the situation.  Any thoughts on this change in
> >> > > behavior?
> >> > >
> >> > > Thanks,
> >> > > Ryan
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> >
> >> > --
> >> >
> >> > David Wisler  |  GemFire Support Product Manager  |  503-810-7840 cell
> >> > Support.Pivotal.io
> >> > <
> >> >
> >>
> http://www.google.com/url?q=http%3A%2F%2Fsupport.pivotal.io%2F&sa=D&sntz=1&usg=AFQjCNGDBr_XSKC18wot5h3OkKoZ84Vn7Q
> >> > >
> >> >   |  Mon-Fri  8:00am to 5:00pm PST  |  1-877-477-2269
> >> > [image: support]
> >> > <
> >> >
> >>
> https://www.google.com/url?q=https%3A%2F%2Fsupport.pivotal.io%2F&sa=D&sntz=1&usg=AFQjCNEvwKLjzu29inKwy4jJjKsboqGMCg
> >> > >
> >> >  [image: twitter]
> >> > <
> >> >
> >>
> https://www.google.com/url?q=https%3A%2F%2Ftwitter.com%2Fpivotal&sa=D&sntz=1&usg=AFQjCNG1FcqkH5ghKsSG6UkdeUzjSuDSHg
> >> > >
> >> >  [image: linkedin]
> >> > <
> >> >
> >>
> https://www.google.com/url?q=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2F3048967&sa=D&sntz=1&usg=AFQjCNHOQGYmDYIQz06S3-vAuqzf8bN8Yw
> >> > >
> >> >  [image: facebook]
> >> > <
> >> >
> >>
> https://www.google.com/url?q=https%3A%2F%2Fwww.facebook.com%2Fpivotalsoftware&sa=D&sntz=1&usg=AFQjCNFQnPFtec1Rp3lKf6MuY1jcbA8j2A
> >> > >
> >> >  [image: google plus] <https://plus.google.com/+Pivotal> [image:
> >> youtube]
> >> > <
> >>
> https://www.youtube.com/playlist?list=PLAdzTan_eSPScpj2J50ErtzR9ANSzv3kl>
> >> >
> >>
> >
> >
> > --
> >
> > David Wisler  |  GemFire Support Product Manager  |  503-810-7840 cell
> > Support.Pivotal.io
> > <
> http://www.google.com/url?q=http%3A%2F%2Fsupport.pivotal.io%2F&sa=D&sntz=1&usg=AFQjCNGDBr_XSKC18wot5h3OkKoZ84Vn7Q
> >
> >   |  Mon-Fri  8:00am to 5:00pm PST  |  1-877-477-2269
> > [image: support]
> > <
> https://www.google.com/url?q=https%3A%2F%2Fsupport.pivotal.io%2F&sa=D&sntz=1&usg=AFQjCNEvwKLjzu29inKwy4jJjKsboqGMCg
> >
> >  [image: twitter]
> > <
> https://www.google.com/url?q=https%3A%2F%2Ftwitter.com%2Fpivotal&sa=D&sntz=1&usg=AFQjCNG1FcqkH5ghKsSG6UkdeUzjSuDSHg
> >
> >  [image: linkedin]
> > <
> https://www.google.com/url?q=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2F3048967&sa=D&sntz=1&usg=AFQjCNHOQGYmDYIQz06S3-vAuqzf8bN8Yw
> >
> >  [image: facebook]
> > <
> https://www.google.com/url?q=https%3A%2F%2Fwww.facebook.com%2Fpivotalsoftware&sa=D&sntz=1&usg=AFQjCNFQnPFtec1Rp3lKf6MuY1jcbA8j2A
> >
> >  [image: google plus] <https://plus.google.com/+Pivotal> [image:
> youtube]
> > <
> https://www.youtube.com/playlist?list=PLAdzTan_eSPScpj2J50ErtzR9ANSzv3kl>
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message