geode-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Smith (JIRA)" <>
Subject [jira] [Commented] (GEODE-4250) Users would like a command to re-establish redundancy without rebalancing
Date Thu, 18 Jan 2018 21:46:00 GMT


Dan Smith commented on GEODE-4250:

In response to Kirk's comment - there isĀ  automated rebalancing and automated redundancy
recovery already. Automated redundancy recovery is handled by the recovery-delay and startup-recovery-delay
settings. For rebalancing, there is a separate geode-rebalancer module.

> Users would like a command to re-establish redundancy without rebalancing
> -------------------------------------------------------------------------
>                 Key: GEODE-4250
>                 URL:
>             Project: Geode
>          Issue Type: Improvement
>          Components: docs, regions
>            Reporter: Fred Krone
>            Priority: Major
> Command would only succeed when the system is fully redundant.
> Re-establishing Redundancy after the loss of a peer node is typically far more urgent
and important than achieving better balance.  The operational impact of rebalancing is also
much higher, forcing impacted buckets' updates to be distributed to _redundancy-copies + 1_
peer processes and potentially spiking p2p connections/threads (and thus load) far beyond
normal operations.  If the system is already close to exhausting available capacity for some
hardware component, this can be enough to push it over-the-edge (and may force the original
fault to recur).    This problem is exacerbated when the cluster's overall capacity has been
reduced due to the loss of a physical server.  Without the ability to separate the operational
tasks of re-establishing full data redundancy and rebalancing bucket partitions (that are
already safely redundant), system administrators may be forced to provision replacement capacity
_before_ they can restore full service, thus increasing downtime unnecessarily. 
> For these reasons, we must add the option to execute these operational tasks separately.
> It still makes sense for _rebalancing_ ops to first re-establish redundancy, so we can
keep the existing GFSH command/behavior (it would still be useful to clearly log completion
of one step before the next one begins).  We need a new GFSH command/ResourceManager API to
execute re-establishment of redundancy _without_ rebalancing.

This message was sent by Atlassian JIRA

View raw message