geode-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kmil...@apache.org
Subject [1/2] incubator-geode git commit: GEODE-2047 Document change to enable-network-partition-detection
Date Tue, 01 Nov 2016 20:53:58 GMT
Repository: incubator-geode
Updated Branches:
  refs/heads/develop 3bdd10497 -> 3822c9053


GEODE-2047 Document change to enable-network-partition-detection

- This is a subtask of GEODE-762.
- The default value of property enable-network-partition-detection
changed from false to true, enabling partition detection by
default, so all documentation that discusses partition detection
also needs to change.
- Fixed a minor typo or two encountered in the files that were
being updated.


Project: http://git-wip-us.apache.org/repos/asf/incubator-geode/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-geode/commit/8f14a744
Tree: http://git-wip-us.apache.org/repos/asf/incubator-geode/tree/8f14a744
Diff: http://git-wip-us.apache.org/repos/asf/incubator-geode/diff/8f14a744

Branch: refs/heads/develop
Commit: 8f14a744c6bc51c422e4f292dc67219f740dc7ba
Parents: 820f33e
Author: Karen Miller <kmiller@pivotal.io>
Authored: Mon Oct 31 16:45:29 2016 -0700
Committer: Karen Miller <kmiller@pivotal.io>
Committed: Tue Nov 1 13:52:22 2016 -0700

----------------------------------------------------------------------
 .../handling_network_partitioning.html.md.erb   | 28 +++++++++++---------
 ...rk_partitioning_management_works.html.md.erb |  7 +++--
 ...ring_conflicting_data_exceptions.html.md.erb |  4 +--
 .../recovering_from_network_outages.html.md.erb | 11 ++------
 .../system_failure_and_recovery.html.md.erb     |  6 ++---
 .../topics/gemfire_properties.html.md.erb       |  4 +--
 6 files changed, 27 insertions(+), 33 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/8f14a744/geode-docs/managing/network_partitioning/handling_network_partitioning.html.md.erb
----------------------------------------------------------------------
diff --git a/geode-docs/managing/network_partitioning/handling_network_partitioning.html.md.erb
b/geode-docs/managing/network_partitioning/handling_network_partitioning.html.md.erb
index 61a2576..a227597 100644
--- a/geode-docs/managing/network_partitioning/handling_network_partitioning.html.md.erb
+++ b/geode-docs/managing/network_partitioning/handling_network_partitioning.html.md.erb
@@ -19,23 +19,24 @@ See the License for the specific language governing permissions and
 limitations under the License.
 -->
 
-This section lists the configuration steps for network partition detection.
+This section lists configuration considerations relating to network partition detection.
 
 <a id="handling_network_partitioning__section_EAF1957B6446491A938DEFB06481740F"></a>
 The system uses a combination of member coordinators and system members, designated as lead
members, to detect and resolve network partitioning problems.
 
-1.  Network partition detection works in all environments. Using multiple locators mitigates
the effect of network partitioning. See [Configuring Peer-to-Peer Discovery](../../topologies_and_comm/p2p_configuration/setting_up_a_p2p_system.html).
-2.  Enable partition detection consistently in all system members by setting this in their
`gemfire.properties` file:
+-   Network partition detection works in all environments. Using multiple locators mitigates
the effect of network partitioning. See [Configuring Peer-to-Peer Discovery](../../topologies_and_comm/p2p_configuration/setting_up_a_p2p_system.html).
+
+-   Network partition detection is enabled by default. The default setting in the `gemfire.properties`
file is
 
     ``` pre
     enable-network-partition-detection=true
     ```
 
-    Enable network partition detection in all locators and in any other process that should
be sensitive to network partitioning. Processes that do not have network partition detection
enabled are not eligible to be the lead member, so their failure will not trigger declaration
of a network partition.
+    Processes that do not have network partition detection enabled are not eligible to be
the lead member, so their failure will not trigger declaration of a network partition.
 
-    All system members should have the same setting for `enable-network-partition-detection`.
If they don’t, the system throws a `GemFireConfigException` upon startup.
+    All system members should have the same setting for `enable-network-partition-detection`.
If they do not, the system throws a `GemFireConfigException` upon startup.
 
-3.  You must set `enable-network-partition-detection` to true if you are using persistent
partitioned regions. You **must** set `enable-network-partition-detection` to true if you
are using persistent regions (partitioned or replicated). If you create a persistent region
and `enable-network-partition-detection` to set to false, you will receive the following warning
message:
+-   The property `enable-network-partition-detection` must be true if you are using either
partitioned or persistent regions. If you create a persistent region and `enable-network-partition-detection`
to set to false, you will receive the following warning message:
 
     ``` pre
     Creating persistent region {0}, but enable-network-partition-detection is set to false.
@@ -43,9 +44,9 @@ The system uses a combination of member coordinators and system members,
designa
           event of a network split."
     ```
 
-4.  Configure regions you want to protect from network partitioning with `DISTRIBUTED_ACK`
or `GLOBAL` `scope`. Do not use `DISTRIBUTED_NO_ACK` `scope`. The region configurations provided
in the region shortcut settings use `DISTRIBUTED_ACK` scope. This setting prevents operations
from performed throughout the distributed system before a network partition is detected.
+-   Configure regions you want to protect from network partitioning with a scope setting
of `DISTRIBUTED_ACK` or `GLOBAL`. Do not use `DISTRIBUTED_NO_ACK` scope. This prevents operations
from being performed throughout the distributed system before a network partition is detected.
     **Note:**
-    GemFire issues an alert if it detects distributed-no-ack regions when network partition
detection is enabled:
+    GemFire issues an alert if it detects `DISTRIBUTED_NO_ACK` regions when network partition
detection is enabled:
 
     ``` pre
     Region {0} is being created with scope {1} but enable-network-partition-detection is
enabled in the distributed system. 
@@ -53,11 +54,12 @@ The system uses a combination of member coordinators and system members,
designa
                                 
     ```
 
-5.  These other configuration parameters affect or interact with network partitioning detection.
Check whether they are appropriate for your installation and modify as needed.
-    -   If you have network partition detection enabled, the threshold percentage value for
allowed membership weight loss is automatically configured to 51. You cannot modify this value.
(**Note:** The weight loss calculation uses standard rounding. Therefore, a value of 50.51
is rounded to 51 and will cause a network partition.)
-    -   Failure detection is initiated if a member's `gemfire.properties` `ack-wait-threshold`
(default is 15 seconds) and `ack-severe-alert-threshold` (15 seconds) elapses before receiving
a response to a message. If you modify the `ack-wait-threshold` configuration value, you should
modify `ack-severe-alert-threshold` to match the other configuration value.
-    -   If the system has clients connecting to it, the clients' `cache.xml` `<cache>
<pool> read-timeout` should be set to at least three times the `member-timeout` setting
in the server's `gemfire.properties`. The default `<cache> <pool> read-timeout`
setting is 10000 milliseconds.
+-   These other configuration parameters affect or interact with network partitioning detection.
Check whether they are appropriate for your installation and modify as needed.
+    -   If you have network partition detection enabled, the threshold percentage value for
allowed membership weight loss is automatically configured to 51. You cannot modify this value.
**Note:** The weight loss calculation uses round to nearest. Therefore, a value of 50.51 is
rounded to 51 and will cause a network partition.
+    -   Failure detection is initiated if a member's `ack-wait-threshold` (default is 15
seconds) and `ack-severe-alert-threshold` (15 seconds) properties elapse before receiving
a response to a message. If you modify the `ack-wait-threshold` configuration value, you should
modify `ack-severe-alert-threshold` to match the other configuration value.
+    -   If the system has clients connecting to it, the clients' `cache.xml` pool `read-timeout`
should be set to at least three times the `member-timeout` setting in the server's `gemfire.properties`
file. The default pool `read-timeout` setting is 10000 milliseconds.
     -   You can adjust the default weights of members by specifying the system property `gemfire.member-weight`
upon startup. For example, if you have some VMs that host a needed service, you could assign
them a higher weight upon startup.
-    -   By default, members that are forced out of the distributed system by a network partition
event will automatically restart and attempt to reconnect. Data members will attempt to reinitialize
the cache. See [Handling Forced Cache Disconnection Using Autoreconnect](../autoreconnect/member-reconnect.html).
+
+-   By default, members that are forced out of the distributed system by a network partition
event will automatically restart and attempt to reconnect. Data members will attempt to reinitialize
the cache. See [Handling Forced Cache Disconnection Using Autoreconnect](../autoreconnect/member-reconnect.html).
 
 

http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/8f14a744/geode-docs/managing/network_partitioning/how_network_partitioning_management_works.html.md.erb
----------------------------------------------------------------------
diff --git a/geode-docs/managing/network_partitioning/how_network_partitioning_management_works.html.md.erb
b/geode-docs/managing/network_partitioning/how_network_partitioning_management_works.html.md.erb
index e971634..93a14ac 100644
--- a/geode-docs/managing/network_partitioning/how_network_partitioning_management_works.html.md.erb
+++ b/geode-docs/managing/network_partitioning/how_network_partitioning_management_works.html.md.erb
@@ -24,10 +24,9 @@ Geode handles network outages by using a weighting system to determine
whether t
 <a id="how_network_partitioning_management_works__section_548146BB8C24412CB7B43E6640272882"></a>
 Individual members are each assigned a weight, and the quorum is determined by comparing
the total weight of currently responsive members to the previous total weight of responsive
members.
 
-Your distributed system can split into separate running systems when members lose the ability
to see each other. The typical cause of this problem is a failure in the network. When a partitioned
system is detected, Apache Geode only one side of the system keeps running and the other side
automatically shuts down.
+Your distributed system can split into separate running systems when members lose the ability
to see each other. The typical cause of this problem is a failure in the network. When a partitioned
system is detected, only one side of the system keeps running and the other side automatically
shuts down.
 
-**Note:**
-The network partitioning detection feature is only enabled when `enable-network-partition-detection`
is set to true in `gemfire.properties`. By default, this property is set to false. See [Configure
Apache Geode to Handle Network Partitioning](handling_network_partitioning.html#handling_network_partitioning)
for details. Quorum weight calculations are always performed and logged regardless of this
configuration setting.
+The network partitioning detection feature is enabled by default with a true value for the
`enable-network-partition-detection` property. See [Configure Apache Geode to Handle Network
Partitioning](handling_network_partitioning.html#handling_network_partitioning) for details.
Quorum weight calculations are always performed and logged regardless of this configuration
setting.
 
 The overall process for detecting a network partition is as follows:
 
@@ -52,7 +51,7 @@ The overall process for detecting a network partition is as follows:
     -   A new coordinator may have a stale view of membership if it did not see the last
membership view sent by the previous (failed) coordinator. If new members were added during
that failure, then the new members may be ignored when the first new view is sent out.
     -   If members were removed during the fail over to the new coordinator, then the new
coordinator will have to determine these losses during the view preparation step.
 
-6.  With `enable-network-partition-detection` set to true, any member that detects that the
total membership weight has dropped below 51% within a single membership view change (loss
of quorum) declares a network partition event. The coordinator sends a network-partitioned-detected
UDP message to all members (even to the non-responsive ones) and then closes the distributed
system with a `ForcedDisconnectException`. If a member fails to receive the message before
the coordinator closes the system, the member is responsible for detecting the event on its
own.
+6.  With a default value of `enable-network-partition-detection`, any member that detects
that the total membership weight has dropped below 51% within a single membership view change
(loss of quorum) declares a network partition event. The coordinator sends a network-partitioned-detected
UDP message to all members (even to the non-responsive ones) and then closes the distributed
system with a `ForcedDisconnectException`. If a member fails to receive the message before
the coordinator closes the system, the member is responsible for detecting the event on its
own.
 
 The presumption is that when a network partition is declared, the members that comprise a
quorum will continue operations. The surviving members elect a new coordinator, designate
a lead member, and so on.
 

http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/8f14a744/geode-docs/managing/troubleshooting/recovering_conflicting_data_exceptions.html.md.erb
----------------------------------------------------------------------
diff --git a/geode-docs/managing/troubleshooting/recovering_conflicting_data_exceptions.html.md.erb
b/geode-docs/managing/troubleshooting/recovering_conflicting_data_exceptions.html.md.erb
index 38375ae..4eade62 100644
--- a/geode-docs/managing/troubleshooting/recovering_conflicting_data_exceptions.html.md.erb
+++ b/geode-docs/managing/troubleshooting/recovering_conflicting_data_exceptions.html.md.erb
@@ -46,7 +46,7 @@ In this case the fix is simply to move aside or delete the persistent files
for
 
 ## A Network Failure Occurs and Network Partitioning Detection is Disabled
 
-When `enable-network-partition-detection` is set to true, Geode will detect a network partition
and shut down unreachable members to prevent a network partition ("split brain") from occurring.
No conflicts should occur when the system is healed.
+When `enable-network-partition-detection` is set to the default value of true, Geode will
detect a network partition and shut down unreachable members to prevent a network partition
("split brain") from occurring. No conflicts should occur when the system is healed.
 
 However if `enable-network-partition-detection` is false, Geode will not detect the network
partition. Instead, each side of the network partition will end up recording that the other
side of the partition has stale data. When the partition is healed and persistent members
are restarted, the members will report a conflict because both sides of the partition think
the other members are stale.
 
@@ -54,7 +54,7 @@ In some cases it may be possible to choose between sides of the network
partitio
 
 ## Salvaging Data
 
-If you receive a ConflictingPersistentDataException, you will not be able to start all of
your members and have them join the same distributed system. You have some members with conflicting
data.
+If you receive a `ConflictingPersistentDataException`, you will not be able to start all
of your members and have them join the same distributed system. You have some members with
conflicting data.
 
 First, see if there is part of the system that you can recover. For example if you just added
some new members to the system, try to start up without including those members.
 

http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/8f14a744/geode-docs/managing/troubleshooting/recovering_from_network_outages.html.md.erb
----------------------------------------------------------------------
diff --git a/geode-docs/managing/troubleshooting/recovering_from_network_outages.html.md.erb
b/geode-docs/managing/troubleshooting/recovering_from_network_outages.html.md.erb
index 8c23bea..f798b2b 100644
--- a/geode-docs/managing/troubleshooting/recovering_from_network_outages.html.md.erb
+++ b/geode-docs/managing/troubleshooting/recovering_from_network_outages.html.md.erb
@@ -23,16 +23,9 @@ The safest response to a network outage is to restart all the processes
and brin
 
 However, if you know the architecture of your system well, and you are sure you won’t be
resurrecting old data, you can do a selective restart. At the very least, you must restart
all the members on one side of the network failure, because a network outage causes separate
distributed systems that can’t rejoin automatically.
 
--   [What Happens During a Network Outage](recovering_from_network_outages.html#rec_network_crash__section_900657018DC048EE9BE6A8064FAE48FD)
--   [Recovery Procedure](recovering_from_network_outages.html#rec_network_crash__section_F9A0C31AE25C4E7185DF3B1A8486BDFA)
--   [Effect of Network Failure on Partitioned Regions](recovering_from_network_outages.html#rec_network_crash__section_9914A63673E64EA1ADB6B6767879F0FF)
--   [Effect of Network Failure on Distributed Regions](recovering_from_network_outages.html#rec_network_crash__section_7AD5624F3CD748C0BC163562B26B2DCE)
--   [Effect of Network Failure on Persistent Regions](#rec_network_crash__section_arm_pnr_3q)
--   [Effect of Network Failure on Client/Server Installations](recovering_from_network_outages.html#rec_network_crash__section_18AEEB6CC8004C3388CCB01F988B0422)
-
 ## <a id="rec_network_crash__section_900657018DC048EE9BE6A8064FAE48FD" class="no-quick-link"></a>What
Happens During a Network Outage
 
-When the network connecting members of a distributed system goes down, system members treat
this like a machine crash. Members on each side of the network failure respond by removing
the members on the other side from the membership list. If network partitioning detection
is enabled, the partition that contains sufficient quorum (&gt; 51% based on member weight)
will continue to operate, while the other partition with insufficient quorum will shut down.
See [Network Partitioning](../network_partitioning/chapter_overview.html#network_partitioning)
for a detailed explanation on how this detection system operates.
+When the network connecting members of a distributed system goes down, system members treat
this like a machine crash. Members on each side of the network failure respond by removing
the members on the other side from the membership list. If network partitioning detection
is enabled (the default), the partition that contains sufficient quorum (&gt; 51% based
on member weight) will continue to operate, while the other partition with insufficient quorum
will shut down. See [Network Partitioning](../network_partitioning/chapter_overview.html#network_partitioning)
for a detailed explanation on how this detection system operates.
 
 In addition, members that have been disconnected either via network partition or due to unresponsiveness
will automatically try to reconnect to the distributed system unless configured otherwise.
See [Handling Forced Cache Disconnection Using Autoreconnect](../autoreconnect/member-reconnect.html).
 
@@ -62,7 +55,7 @@ When the network recovers, the members may be able to see each other again,
but
 
 A network failure when using persistent regions can cause conflicts in your persisted data.
When you recover your system, you will likely encounter `ConflictingPersistentDataException`s
when members start up.
 
-For this reason, you must configure `enable-network-partition-detection` to `true` if you
are using persistent regions.
+For this reason, `enable-network-partition-detection` must be set to true if you are using
persistent regions.
 
 For information on how to recover from `ConflictingPersistentDataException` errors should
they occur, see [Recovering from ConfictingPersistentDataExceptions](recovering_conflicting_data_exceptions.html#topic_ghw_z2m_jq).
 

http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/8f14a744/geode-docs/managing/troubleshooting/system_failure_and_recovery.html.md.erb
----------------------------------------------------------------------
diff --git a/geode-docs/managing/troubleshooting/system_failure_and_recovery.html.md.erb b/geode-docs/managing/troubleshooting/system_failure_and_recovery.html.md.erb
index d94ea60..cce80d0 100644
--- a/geode-docs/managing/troubleshooting/system_failure_and_recovery.html.md.erb
+++ b/geode-docs/managing/troubleshooting/system_failure_and_recovery.html.md.erb
@@ -181,7 +181,7 @@ There are no processes eligible to be group membership coordinator
 
 Description:
 
-Network partition detection is enabled (enable-network-partition-detection is set to true),
and there are locator problems.
+Network partition detection is enabled, and there are locator problems.
 
 Response:
 
@@ -197,7 +197,7 @@ There are no processes eligible to be group membership coordinator
 
 Description:
 
-Network partition detection is enabled (enable-network-partition-detection is set to true),
and there are locator problems.
+Network partition detection is enabled, and there are locator problems.
 
 Response:
 
@@ -212,7 +212,7 @@ Unable to contact any locators and network partition detection is enabled
 
 Description:
 
-Network partition detection is enabled (enable-network-partition-detection is set to true),
and there are locator problems.
+Network partition detection is enabled, and there are locator problems.
 
 Response:
 

http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/8f14a744/geode-docs/reference/topics/gemfire_properties.html.md.erb
----------------------------------------------------------------------
diff --git a/geode-docs/reference/topics/gemfire_properties.html.md.erb b/geode-docs/reference/topics/gemfire_properties.html.md.erb
index 9882568..ae0f198 100644
--- a/geode-docs/reference/topics/gemfire_properties.html.md.erb
+++ b/geode-docs/reference/topics/gemfire_properties.html.md.erb
@@ -160,8 +160,8 @@ See <a href="../../managing/autoreconnect/member-reconnect.html">Handling
Forced
 </tr>
 <tr class="odd">
 <td>enable-network-partition-detection</td>
-<td>Boolean instructing the system to detect and handle splits in the distributed system,
typically caused by a partitioning of the network (split brain) where the distributed system
is running. We recommend setting this property to <code class="ph codeph">true</code>.
You must set this property to the same value across all your distributed system members. In
addition, you must set this property to <code class="ph codeph">true</code> if
you are using persistent regions and configure your regions to use DISTRIBUTED_ACK or GLOBAL
scope to avoid potential data conflicts.</td>
-<td>false</td>
+<td>Boolean instructing the system to detect and handle splits in the distributed system,
typically caused by a partitioning of the network (split brain) where the distributed system
is running. You must set this property to the same value across all your distributed system
members. In addition, this property must be set to <code class="ph codeph">true</code>
if you are using persistent regions and configure your regions to use DISTRIBUTED_ACK or GLOBAL
scope to avoid potential data conflicts.</td>
+<td>true</td>
 </tr>
 <tr class="even">
 <td>enable-cluster-configuration</td>


Mime
View raw message