geode-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GEODE-1672) When amount of overflowed persisted data exceeds heap size startup may run out of memory
Date Wed, 01 Feb 2017 19:24:51 GMT

    [ https://issues.apache.org/jira/browse/GEODE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848824#comment-15848824
] 

ASF subversion and git services commented on GEODE-1672:
--------------------------------------------------------

Commit e606f3e6ec0828f5fc30e20a9dbdf3aa8c3c8620 in geode's branch refs/heads/develop from
[~agingade]
[ https://git-wip-us.apache.org/repos/asf?p=geode.git;h=e606f3e ]

GEODE-1672: Disabled recovering values for LRU region during startup.

When recovering persistent files, system stores the values in temp
maps (for regions) using a background thread, as these maps are not
actual regions, the temp-regions are  not considered/included for
LRU eviction, which causes the system to run OOM.

The problem is fixed by skipping recovering  values for LRU regions.

A new system property ""disk.recoverLruValues" is added to support
reading values for lru regions.


> When amount of overflowed persisted data exceeds heap size startup may run out of memory
> ----------------------------------------------------------------------------------------
>
>                 Key: GEODE-1672
>                 URL: https://issues.apache.org/jira/browse/GEODE-1672
>             Project: Geode
>          Issue Type: Bug
>          Components: persistence
>            Reporter: Darrel Schneider
>
> Basically, when the amount of data overflowed approaches the heap size, ,such that the
total amount of data is very close to or actually surpasses your total tenured heap, it is
possible that you will not be able to restart.
> The algorithm during recovery of oplogs/buckets is such that we don't "evict" in the
normal sense as data fills the heap during early stages of recovery prior to creating the
regions. When the data is first created in the heap, it's not yet official in the region.
> At any rate, if during this early phase of recovery, or during subsequent phase where
eviction is working as usual, it is possible that the total data or an early imbalance of
buckets prior to the opportunity to rebalance causes us to surpass the critical threshold
which will kill us before successful startup.
> To reproduce, you could have 1 region with tons of data that evicts and overflows with
persistence. Call it R1. Then another region with persistence that does not evict. Call it
R2.
> List R1 fist in the cache.xml file. Start running the system and add data over time until
you have overflowed tons of data approaching the heap size in the evicted region, and also
have enough data in the R2 region.
> Once you fill these regions with enough data and have overflowed enough to disk and persisted
the other region, then shutdown, and then attempt to restart. If you put enough data in, you
will hit the critical threshold before being able to complete startup.
> You can work around this issue by configuring geode to not recovery values by setting
this system property: -Dgemfire.disk.recoverValues=false
> Values will not be faulted into memory until a read operation is done on that value's
key.
> If you have regions that do not use overflow and some that do then another work around
is the create the regions that do not use overflow first. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message