Return-Path: X-Original-To: apmail-helix-commits-archive@minotaur.apache.org Delivered-To: apmail-helix-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8461010524 for ; Tue, 17 Mar 2015 21:49:05 +0000 (UTC) Received: (qmail 7769 invoked by uid 500); 17 Mar 2015 21:49:05 -0000 Delivered-To: apmail-helix-commits-archive@helix.apache.org Received: (qmail 7725 invoked by uid 500); 17 Mar 2015 21:49:05 -0000 Mailing-List: contact commits-help@helix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@helix.apache.org Delivered-To: mailing list commits@helix.apache.org Received: (qmail 7712 invoked by uid 99); 17 Mar 2015 21:49:05 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Mar 2015 21:49:05 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO mail.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 17 Mar 2015 21:48:43 +0000 Received: (qmail 6858 invoked by uid 99); 17 Mar 2015 21:48:40 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Mar 2015 21:48:40 +0000 Date: Tue, 17 Mar 2015 21:48:40 +0000 (UTC) From: "Zhen Zhang (JIRA)" To: commits@helix.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HELIX-541) Possible "livelock" in Helix controller MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HELIX-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Zhang updated HELIX-541: ----------------------------- Affects Version/s: 0.6.4 > Possible "livelock" in Helix controller > --------------------------------------- > > Key: HELIX-541 > URL: https://issues.apache.org/jira/browse/HELIX-541 > Project: Apache Helix > Issue Type: Bug > Affects Versions: 0.6.4 > Reporter: Zhen Zhang > Assignee: Zhen Zhang > Fix For: 0.6.5 > > > We discover a "livelock" bug in Helix controller. This leads to HELIX-540. Assuming we have 3 partitions and 2 nodes, using LeaderStandby state model, FULL_AUTO mode, and replica is 2. When reaching stable mapping, we might have the following: > {noformat} > partition_0: > node_0: LEADER > node_1: STANDBY > partition_1: > node_1: LEADER > node_0: STANDBY > partition_2: > node_0: LEADER > node_1: STANDBY > {noformat} > Later we add a new node (node_2) to the cluster and rebalancer decides that node_2 should become LEADER for partition_2. So controller first sends OFFLINE->STANDBY transition to node_2, and the mapping becomes: > {noformat} > partition_0: > node_0: LEADER > node_1: STANDBY > partition_1: > node_1: LEADER > node_0: STANDBY > partition_2: > node_0: LEADER > node_1: STANDBY > node_2: STANDBY > {noformat} > Note that given LEADER state count is 1 and STANDBY state count is R, where R=2, it is implying the following state constraints: > {noformat} > LEADER: upper_bound=1 > STANDBY: upper_bound=2 > {noformat} > Helix controller now enters the "livelock": it can't send STANDBY->LEADER to node_2, since this will violate LEADER upper bound; it can't send LEADER->STANDBY to node_0 either, since will violate STANDBY upper bound. > We can solve the problem in several ways: > 1) State count definition is ambiguous. For some state, like LEADER, when we say state_count=1, that means we can't violate this constraint at any time. However, for some other state, like STANDBY, when we say state_count=R, that means in stable mapping, there should be R-1 STANDBY replicas, but we don't care the count in any transient state. In this case, we can set STANDBY upper_bound to be larger than 2. Note that this doesn't solve the problem in general. We may have some state model that has a restrict requirement on STANDBY state. > 2) Using state transition priority. If we define LEADER->STANDBY transition should have a higher priority than OFFLINE->STANDBY transition, it will solve the "livelock". But this doesn't solve the problem in general either, because when the state model gets complicated, it's hard to define and prove proper transition priorities that avoid "livelock" in any situation. > 3) The root cause of the problem is that Helix controller uses a greedy algorithm that only looks one step ahead. In the example, if Helix controller can look two steps further, then it will find out that sending OFFLINE->STANDBY transition to node_2 will lead to a dead end, therefore it should choose to send LEADER->STANDBY to node_0 instead. In general we might need to do a DFS/BFS and it's hard if state model is complicated and system is large. > In practice, most systems use simple state models with less than 5 states and have strict state constraint on a single state (e.g MASTER, LEADER) only. We can avoid "livelock" by carefully choosing state constraints. -- This message was sent by Atlassian JIRA (v6.3.4#6332)