geode-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dbar...@apache.org
Subject geode git commit: GEODE-1672: When amount of overflowed persisted data exceeds heap size startup may run out of memory This closes #559
Date Mon, 05 Jun 2017 21:38:04 GMT
Repository: geode
Updated Branches:
  refs/heads/release/1.2.0 96f77fc8c -> 9aacc6570


GEODE-1672: When amount of overflowed persisted data exceeds heap size startup may run out
of memory
This closes #559


Project: http://git-wip-us.apache.org/repos/asf/geode/repo
Commit: http://git-wip-us.apache.org/repos/asf/geode/commit/9aacc657
Tree: http://git-wip-us.apache.org/repos/asf/geode/tree/9aacc657
Diff: http://git-wip-us.apache.org/repos/asf/geode/diff/9aacc657

Branch: refs/heads/release/1.2.0
Commit: 9aacc6570b12e1972bd9e62f2c28487beb9812f6
Parents: 96f77fc
Author: Dave Barnes <dbarnes@pivotal.io>
Authored: Mon Jun 5 11:25:44 2017 -0700
Committer: Dave Barnes <dbarnes@pivotal.io>
Committed: Mon Jun 5 14:37:07 2017 -0700

----------------------------------------------------------------------
 .../setting_distributed_properties.html.md.erb  |  2 +-
 .../setting_cache_properties.html.md.erb        |  2 +-
 .../cluster_config/gfsh_persist.html.md.erb     |  8 +-
 .../eviction/how_eviction_works.html.md.erb     |  2 +-
 .../chapter_overview.html.md.erb                |  2 +-
 .../system_failure_and_recovery.html.md.erb     | 77 +++++++++++++++++++-
 6 files changed, 84 insertions(+), 9 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/geode/blob/9aacc657/geode-docs/basic_config/gemfire_properties/setting_distributed_properties.html.md.erb
----------------------------------------------------------------------
diff --git a/geode-docs/basic_config/gemfire_properties/setting_distributed_properties.html.md.erb
b/geode-docs/basic_config/gemfire_properties/setting_distributed_properties.html.md.erb
index a3601c7..faf2241 100644
--- a/geode-docs/basic_config/gemfire_properties/setting_distributed_properties.html.md.erb
+++ b/geode-docs/basic_config/gemfire_properties/setting_distributed_properties.html.md.erb
@@ -22,7 +22,7 @@ limitations under the License.
 Geode provides a default distributed system configuration for out-of-the-box systems. To
use non-default configurations and to fine-tune your member communication, you can use a mix
of various options to customize your distributed system configuration.
 
 <a id="setting_distributed_properties__section_67EBCC53EB174B108DA7271E2CD2B76C"></a>
-Geode properties are used to join a distributed system and configure system member behavior.
Configure your Geode properties through the `gemfire.properties` file, the Java API, or command-line
input. Generally, you store all your properties in the `gemfire.properties` file, but you
may need to provide properties through other means, for example, to pass in security properties
for username and password that you have received from keyboard input.
+Geode properties are used to join a distributed system and configure system member behavior.
Configure your Geode properties through the `gemfire.properties` file, the Java API, or command-line
input. Generally, you store all your properties in the `gemfire.properties` file, but you
may need to provide properties through other means, for example, to pass in security properties
for a username and password that you have received from keyboard input.
 
 **Note:**
 Check with your Geode system administrator before changing properties through the API, including
the `gemfire.properties` and `gfsecurity.properties` settings. The system administrator may
need to set properties at the command line or in configuration files. Any change made through
the API overrides those other settings.

http://git-wip-us.apache.org/repos/asf/geode/blob/9aacc657/geode-docs/basic_config/the_cache/setting_cache_properties.html.md.erb
----------------------------------------------------------------------
diff --git a/geode-docs/basic_config/the_cache/setting_cache_properties.html.md.erb b/geode-docs/basic_config/the_cache/setting_cache_properties.html.md.erb
index c56eaa7..cf5133a 100644
--- a/geode-docs/basic_config/the_cache/setting_cache_properties.html.md.erb
+++ b/geode-docs/basic_config/the_cache/setting_cache_properties.html.md.erb
@@ -29,7 +29,7 @@ Cache configuration properties define:
 
 Configure the cache and its data regions through one or more of these methods:
 
--   Through a persistent configuration that you define when issuing commands that use the
gfsh command line utility. `gfsh` supports the administration, debugging, and deployment of
Apache Geode processes and applications. You can use gfsh to configure regions, locators,
servers, disk stores, event queues, and other objects.
+-   Through a persistent configuration that you define when issuing commands that use the
gfsh command line utility. The gfsh utility supports the administration, debugging, and deployment
of Apache Geode processes and applications. You can use gfsh to configure regions, locators,
servers, disk stores, event queues, and other objects.
 
     As you issue commands, gfsh saves a set of configurations that apply to the entire cluster
and also saves configurations that only apply to defined groups of members within the cluster.
You can re-use these configurations to create a distributed system. See [Overview of the Cluster
Configuration Service](../../configuring/cluster_config/gfsh_persist.html).
 

http://git-wip-us.apache.org/repos/asf/geode/blob/9aacc657/geode-docs/configuring/cluster_config/gfsh_persist.html.md.erb
----------------------------------------------------------------------
diff --git a/geode-docs/configuring/cluster_config/gfsh_persist.html.md.erb b/geode-docs/configuring/cluster_config/gfsh_persist.html.md.erb
index 4e21735..0169baf 100644
--- a/geode-docs/configuring/cluster_config/gfsh_persist.html.md.erb
+++ b/geode-docs/configuring/cluster_config/gfsh_persist.html.md.erb
@@ -106,13 +106,13 @@ There are some configurations that you cannot create using `gfsh`, and
that you
     -   `partition-listener`
     -   `transaction-listener`
     -   `transaction-writer`
--   Adding or removing a TransactionListener
+-   Adding or removing a `TransactionListener`
 -   Adding JNDI bindings
--   Deleting an AsyncEventQueue
+-   Deleting an `AsyncEventQueue`
 
-In addition, there are some limitations on configuring gateways using `gfsh`.You must use
cache.xml or the Java APIs to configure the following:
+In addition, there are some limitations on configuring gateways using `gfsh`. You must use
cache.xml or the Java APIs to configure the following:
 
--   Configuring a GatewayConflictResolver
+-   Configuring a `GatewayConflictResolver`
 -   You cannot specify parameters and values for Java classes for the following:
     -   `gateway-listener`
     -   `gateway-conflict-resolver`

http://git-wip-us.apache.org/repos/asf/geode/blob/9aacc657/geode-docs/developing/eviction/how_eviction_works.html.md.erb
----------------------------------------------------------------------
diff --git a/geode-docs/developing/eviction/how_eviction_works.html.md.erb b/geode-docs/developing/eviction/how_eviction_works.html.md.erb
index ee702ea..a714253 100644
--- a/geode-docs/developing/eviction/how_eviction_works.html.md.erb
+++ b/geode-docs/developing/eviction/how_eviction_works.html.md.erb
@@ -28,7 +28,7 @@ When Geode determines that adding or updating an entry would take the region
ove
 
 ## <a id="how_eviction_works__section_69E2AA453EDE4E088D1C3332C071AFE1" class="no-quick-link"></a>Eviction
in Partitioned Regions
 
-In partitioned regions, Geode removes the oldest entry it can find *in the bucket where the
new entry operation is being performed*. Geode maintains LRU entry information on a bucket-by-bucket
bases, as the cost of maintaining information across the partitioned region would be too great
a performance hit.
+In partitioned regions, Geode removes the oldest entry it can find *in the bucket where the
new entry operation is being performed*. Geode maintains LRU entry information on a bucket-by-bucket
basis, as the cost of maintaining information across the partitioned region would be too great
a performance hit.
 
 -   For memory and entry count eviction, LRU eviction is done in the bucket where the new
entry operation is being performed until the overall size of the combined buckets in the member
has dropped enough to perform the operation without going over the limit.
 -   For heap eviction, each partitioned region bucket is treated as if it were a separate
region, with each eviction action only considering the LRU for the bucket, and not the partitioned
region as a whole.

http://git-wip-us.apache.org/repos/asf/geode/blob/9aacc657/geode-docs/managing/troubleshooting/chapter_overview.html.md.erb
----------------------------------------------------------------------
diff --git a/geode-docs/managing/troubleshooting/chapter_overview.html.md.erb b/geode-docs/managing/troubleshooting/chapter_overview.html.md.erb
index e3c0705..8722a77 100644
--- a/geode-docs/managing/troubleshooting/chapter_overview.html.md.erb
+++ b/geode-docs/managing/troubleshooting/chapter_overview.html.md.erb
@@ -45,7 +45,7 @@ This section provides strategies for handling common errors and failure
situatio
 
     When a machine crashes because of a shutdown, power loss, hardware failure, or operating
system failure, all of its applications and cache servers and their local caches are lost.
 
--   **[Recovering from ConfictingPersistentDataExceptions](../../managing/troubleshooting/recovering_conflicting_data_exceptions.html)**
+-   **[Recovering from ConflictingPersistentDataExceptions](../../managing/troubleshooting/recovering_conflicting_data_exceptions.html)**
 
     A `ConflictingPersistentDataException` while starting up persistent members indicates
that you have multiple copies of some persistent data, and Geode cannot determine which copy
to use.
 

http://git-wip-us.apache.org/repos/asf/geode/blob/9aacc657/geode-docs/managing/troubleshooting/system_failure_and_recovery.html.md.erb
----------------------------------------------------------------------
diff --git a/geode-docs/managing/troubleshooting/system_failure_and_recovery.html.md.erb b/geode-docs/managing/troubleshooting/system_failure_and_recovery.html.md.erb
index cce80d0..740cc04 100644
--- a/geode-docs/managing/troubleshooting/system_failure_and_recovery.html.md.erb
+++ b/geode-docs/managing/troubleshooting/system_failure_and_recovery.html.md.erb
@@ -276,8 +276,83 @@ find the reason.
 
 Description:
 
-The process discovered that it was not in the distributed system and cannot determine why
it was removed. The membership coordinator removed the member after it failed to respond to
an internal are you alive message.
+The process discovered that it was not in the distributed system and cannot determine why
it was
+removed. The membership coordinator removed the member after it failed to respond to an internal

+are-you-alive message.
 
 Response:
 
 The operator should examine the locator processes and logs.
+
+## <a id="restart-failure-persistent-lru" class="no-quick-link"></a> Restart
Fails Due To Out-of-Memory Error
+
+This section describes a restart failure that can occur when the stopped system is one that
was configured with persistent regions. Specifically:
+
+- Some of the regions of the recovering system, when running, were configured as PERSISTENT
regions, which means that they save their data to disk.
+- At least one of the persistent regions was configured to evict least recently used (LRU)
data by overflowing values to disk.
+
+### How Data is Recovered From Persistent Regions
+
+Data recovery, upon restart, always recovers keys. You can configure whether and how the
system
+recovers the values associated with those keys to populate the system cache.
+
+**Value Recovery**
+
+- Recovering all values immediately during startup slows the startup time but results in
consistent
+read performance after the startup on a "hot" cache.
+
+- Recovering no values means quicker startup but a "cold" cache, so the first retrieval of
each value will read from disk.
+
+- Retrieving values asynchronously in a background thread allows a relatively quick startup
on a "warm" cache
+that will eventually recover every value.
+
+**Retrieve or Ignore LRU values**
+
+When a system with persistent LRU regions shuts down, the system does not record which of
the values
+were recently used. On subsequent startup, if values are recovered into an LRU region they
may be
+the least recently used instead of the most recently used. Also, if LRU values are recovered
on a
+heap or an off-heap LRU region, it is possible that the LRU memory limit will be exceeded,
resulting
+in an `OutOfMemoryException` during recovery. For these reasons, LRU value recovery can be
treated
+differently than non-LRU values.
+
+## Default Recovery Behavior for Persistent Regions
+
+The default behavior is for the system to recover all keys, then asynchronously recover all
data
+values that were resident, leaving LRU values unrecovered. This default strategy is best
for
+most applications, because it strikes a balance between recovery speed and cache completeness.
+
+### Configuring Recovery of Persistent Regions
+
+Three Java system parameters allow the developer to control the recovery behavior for persistent
regions:
+
+- `gemfire.disk.recoverValues`
+
+  Default = `true`, recover values. If `false`, recover only keys, do not recover values.
+
+  *How used:* When `true`, recovery of the values "warms up" the cache so data retrievals
will find
+  their values in the cache, without causing time consuming disk accesses. When `false`,
shortens
+  recovery time so the system becomes available for use sooner, but the first retrieval on
each key
+  will require a disk read.
+
+- `gemfire.disk.recoverLruValues`
+
+  Default = `false`, do not recover LRU values. If `true`, recover LRU values. If
+  `gemfire.disk.recoverValues` is `false`, then `gemfire.disk.recoverLruValues` is ignored,
since
+  no values are recovered.
+
+  *How used:* When `false`, shortens recovery time by ignoring LRU values. When `true`, restores
+  more data values to the cache. Recovery of the LRU values increases heap memory usage and
+  could cause an out-of-memory error, preventing the system from restarting.
+
+- `gemfire.disk.recoverValuesSync`
+
+  Default = `false`, recover values by an asynchronous background process. If `true`, values
are
+  recovered synchronously, and recovery is not complete until all values have been retrieved.
 If
+  `gemfire.disk.recoverValues` is `false`, then `gemfire.disk.recoverValuesSync` is ignored
since
+  no values are recovered.
+
+  *How used:* When `false`, allows the system to become available sooner, but some time must
elapse
+  before the entire cache is refreshed. Some key retrievals will require disk access, and
some will not.
+  When `true`, prolongs restart time, but ensures that when available for use, the cache
is fully
+  populated and data retrieval times will be optimal.
+


Mime
View raw message