geode-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GEODE-1672) When amount of overflowed persisted data exceeds heap size startup may run out of memory
Date Mon, 05 Jun 2017 23:20:12 GMT

    [ https://issues.apache.org/jira/browse/GEODE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037825#comment-16037825
] 

ASF GitHub Bot commented on GEODE-1672:
---------------------------------------

Github user davebarnes97 commented on the issue:

    https://github.com/apache/geode/pull/559
  
    Thanks, Darrel. Changes incorporated.
    
    On Mon, Jun 5, 2017 at 3:19 PM, Darrel Schneider <notifications@github.com>
    wrote:
    
    > *@dschneider-pivotal* commented on this pull request.
    >
    > This looks good. I just had two comments
    > ------------------------------
    >
    > In geode-docs/managing/troubleshooting/system_
    > failure_and_recovery.html.md.erb
    > <https://github.com/apache/geode/pull/559#discussion_r120222821>:
    >
    > > +- Retrieving values asynchronously in a background thread allows a relatively
quick startup on a "warm" cache
    > +that will eventually recover every value.
    > +
    > +**Retrieve or Ignore LRU values**
    > +
    > +When a system with persistent LRU regions shuts down, the system does not record
which of the values
    > +were recently used. On subsequent startup, if values are recovered into an LRU region
they may be
    > +the least recently used instead of the most recently used. Also, if LRU values are
recovered on a
    > +heap or an off-heap LRU region, it is possible that the LRU memory limit will be
exceeded, resulting
    > +in an `OutOfMemoryException` during recovery. For these reasons, LRU value recovery
can be treated
    > +differently than non-LRU values.
    > +
    > +## Default Recovery Behavior for Persistent Regions
    > +
    > +The default behavior is for the system to recover all keys, then asynchronously
recover all data
    > +values that were resident, leaving LRU values unrecovered. This default strategy
is best for
    >
    > drop "that were resident"
    > ------------------------------
    >
    > In geode-docs/managing/troubleshooting/system_
    > failure_and_recovery.html.md.erb
    > <https://github.com/apache/geode/pull/559#discussion_r120224399>:
    >
    > > +  `gemfire.disk.recoverValues` is `false`, then `gemfire.disk.recoverLruValues`
is ignored, since
    > +  no values are recovered.
    > +
    > +  *How used:* When `false`, shortens recovery time by ignoring LRU values. When
`true`, restores
    > +  more data values to the cache. Recovery of the LRU values increases heap memory
usage and
    > +  could cause an out-of-memory error, preventing the system from restarting.
    > +
    > +- `gemfire.disk.recoverValuesSync`
    > +
    > +  Default = `false`, recover values by an asynchronous background process. If `true`,
values are
    > +  recovered synchronously, and recovery is not complete until all values have been
retrieved.  If
    > +  `gemfire.disk.recoverValues` is `false`, then `gemfire.disk.recoverValuesSync`
is ignored since
    > +  no values are recovered.
    > +
    > +  *How used:* When `false`, allows the system to become available sooner, but some
time must elapse
    > +  before the entire cache is refreshed. Some key retrievals will require disk access,
and some will not.
    >
    > change "the entire cache is refreshed" to "all values have been read from
    > disk into cache memory"
    >
    > —
    > You are receiving this because you authored the thread.
    > Reply to this email directly, view it on GitHub
    > <https://github.com/apache/geode/pull/559#pullrequestreview-42167706>, or mute
    > the thread
    > <https://github.com/notifications/unsubscribe-auth/AMh6OXa0sK3YqVnL799stZ_ZoszSaESGks5sBH78gaJpZM4NwayO>
    > .
    >



> When amount of overflowed persisted data exceeds heap size startup may run out of memory
> ----------------------------------------------------------------------------------------
>
>                 Key: GEODE-1672
>                 URL: https://issues.apache.org/jira/browse/GEODE-1672
>             Project: Geode
>          Issue Type: Bug
>          Components: docs, persistence
>            Reporter: Darrel Schneider
>            Assignee: Anilkumar Gingade
>             Fix For: 1.2.0
>
>
> Basically, when the amount of data overflowed approaches the heap size, ,such that the
total amount of data is very close to or actually surpasses your total tenured heap, it is
possible that you will not be able to restart.
> The algorithm during recovery of oplogs/buckets is such that we don't "evict" in the
normal sense as data fills the heap during early stages of recovery prior to creating the
regions. When the data is first created in the heap, it's not yet official in the region.
> At any rate, if during this early phase of recovery, or during subsequent phase where
eviction is working as usual, it is possible that the total data or an early imbalance of
buckets prior to the opportunity to rebalance causes us to surpass the critical threshold
which will kill us before successful startup.
> To reproduce, you could have 1 region with tons of data that evicts and overflows with
persistence. Call it R1. Then another region with persistence that does not evict. Call it
R2.
> List R1 fist in the cache.xml file. Start running the system and add data over time until
you have overflowed tons of data approaching the heap size in the evicted region, and also
have enough data in the R2 region.
> Once you fill these regions with enough data and have overflowed enough to disk and persisted
the other region, then shutdown, and then attempt to restart. If you put enough data in, you
will hit the critical threshold before being able to complete startup.
> You can work around this issue by configuring geode to not recovery values by setting
this system property: -Dgemfire.disk.recoverValues=false
> Values will not be faulted into memory until a read operation is done on that value's
key.
> If you have regions that do not use overflow and some that do then another work around
is the create the regions that do not use overflow first. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message