Return-Path: X-Original-To: apmail-helix-user-archive@minotaur.apache.org Delivered-To: apmail-helix-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C9A1310AF6 for ; Sat, 4 May 2013 16:25:55 +0000 (UTC) Received: (qmail 34487 invoked by uid 500); 4 May 2013 16:25:55 -0000 Delivered-To: apmail-helix-user-archive@helix.apache.org Received: (qmail 34452 invoked by uid 500); 4 May 2013 16:25:55 -0000 Mailing-List: contact user-help@helix.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@helix.incubator.apache.org Delivered-To: mailing list user@helix.incubator.apache.org Received: (qmail 34445 invoked by uid 99); 4 May 2013 16:25:55 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 May 2013 16:25:55 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of g.kishore@gmail.com designates 209.85.212.179 as permitted sender) Received: from [209.85.212.179] (HELO mail-wi0-f179.google.com) (209.85.212.179) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 May 2013 16:25:48 +0000 Received: by mail-wi0-f179.google.com with SMTP id l13so1458908wie.0 for ; Sat, 04 May 2013 09:25:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=kVXjoG/vIbh3qcmULHfOAm8HJ465fGWzU90rlbagUz0=; b=p9k42WX6BiKMkwpHBp1ciYnDUsgvKUfxizW5RKGjBP9wi4YQwrGEOkDIUv2PrafCZp AIWSmBIsLRIdGTxwW9V36H+prAlgSZ+FJvIq2CYrpvV93L1EtY9GYuSNgdDQBtdw/hzy Q/Xd9PDoJHSq6fv7hZb0LfTHagvPIrgl6Jo9Q3wBpFo45GNrIRYO61hzXqze41pzuyyD PmWNT+ockIsKNC3kO/G7olGkFQbRe1RqvyjTFlZZed6nuMAUhZvxIrbG3lt0+XCEHeTk bxBkmRSe9LXbBq8+ISJ1szfCr/nQtPanxM6h+DPuHrg3sORZw+kzc84XJ/fFN0abmRc+ u+tw== MIME-Version: 1.0 X-Received: by 10.194.58.78 with SMTP id o14mr18762004wjq.39.1367684728596; Sat, 04 May 2013 09:25:28 -0700 (PDT) Received: by 10.194.0.40 with HTTP; Sat, 4 May 2013 09:25:28 -0700 (PDT) Received: by 10.194.0.40 with HTTP; Sat, 4 May 2013 09:25:28 -0700 (PDT) In-Reply-To: <1A11C172-6519-493C-A4A9-A66194CC4B8E@mac.com> References: <28CB11C1-1D3F-4EAE-BCEC-41EC6CA84604@mac.com> <1A11C172-6519-493C-A4A9-A66194CC4B8E@mac.com> Date: Sat, 4 May 2013 09:25:28 -0700 Message-ID: Subject: Re: Long GC From: kishore g To: user@helix.incubator.apache.org Content-Type: multipart/alternative; boundary=047d7ba977be69aedf04dbe6eade X-Virus-Checked: Checked by ClamAV on apache.org --047d7ba977be69aedf04dbe6eade Content-Type: text/plain; charset=ISO-8859-1 Hi Ming I dont see anything wrong with the design. What you need is the ability to validate few things before reconnecting to cluster. We do invoke a preconnect callback before joining the cluster.you can validate for consistency and refuse to join the cluster. You can also disable the node if validation fails Will this work On May 4, 2013 9:03 AM, "Ming Fang" wrote: > Kishore > > I'm setting _sessionTimeout to 3 seconds. > That's an aggressive number, but my applications needs to detect failures > quickly. > I suppose taking the participant to OFFLINE is acceptable but I can't have > it flip back to MASTER. > > I didn't want to bore you with the details before but I think I need to > explain my system more now. > We are using Helix to manage a MASTER/SLAVE cluster using AUTO mode. > AUTO mode enable us to place the MASTER and SLAVE to the correct host. > We name the MASTER as Node1 and SLAVE as Node2. > > The system processes a high rate of incoming events, thousands per second. > Node1 consumes the events, generate internal state, and then replicates > event to the Node2. > Node2 will consume events from the Node1 and generates exactly same > internal state. > > When Node1 fails, we want Node2 to become new MASTER and process incoming > events. > This means we can not restart Node1 since the Node2's state has move > beyond the failed MASTER. > We keep the failed Node1 down for the rest of the business day. > Everything works as expected under ideal situation. > > The problem we're experiencing with long GCs is that Node1 transitions to > OFFLINE and then back to MASTER. > This causes the Node1 and Node2 to get out of sync. > > Ideally I can find a general solution such that whenever Node2 becomes > MASTER, it modifies the Ideal state so that Node1 can come back as SLAVE. > This solution will address the Node1 failure issue and think should fix > the long GC issue too. > Sorry for the long email. > > --ming > > > > On May 4, 2013, at 10:29 AM, kishore g wrote: > > Hi Ming, > > Need some more details, > 1. How long was the GC, what is the session timeout in zk. > > Behavior you are seeing is expected, what is happening is due to GC and > losing zookeeper session we call the transitions so that partition goes > back to OFFLINE state. > > What is the behavior you are looking for when there is GC. > > a. You dont want to lose mastership ? or > b. Its ok to lose mastership but you dont want to become master again ? > > One question regarding your application, is it possible your application > can recover after long GC pause? > > Dont think this is related to HELIX-79, in that case there were > consecutive GC's and I think we have a patch for that issue. > > Thanks, > Kishore G > > > On Sat, May 4, 2013 at 6:32 AM, Ming Fang wrote: > >> We're experiencing a potentially showstopper issue with how Helix is >> dealing with very long GCs. >> Our system is using the Master Slave model. >> A simple test when running just the Master under extreme load, causing >> seconds of GC. >> Under long GC condition the Master gets transitioned to Slave then to >> Offline. >> After the GC, we get transited back to Slave then to Master. >> >> I found this Jira that may be related HELIX-79 >> . >> We're scheduled to go live with our system next week. >> Are there any quick workarounds for this problem? >> >> >> > > --047d7ba977be69aedf04dbe6eade Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

Hi Ming

I dont see anything wrong with the design. What you need is the ability = to validate few things before reconnecting to cluster. We do invoke a preco= nnect callback before joining the cluster.you can validate for consistency = and refuse to join the cluster. You can also disable the node if validation= fails
Will this work

On May 4, 2013 9:03 AM, "Ming Fang" &l= t;mingfang@mac.com> wrote:
Kishore

I'm sett= ing=A0_sessionTimeout to 3 seconds.
That's an aggressive numb= er, but my applications needs to detect failures quickly.
I suppo= se taking the participant to OFFLINE is acceptable but I can't have it = flip back to MASTER.

I didn't want to bore you with the details before b= ut I think I need to explain my system more now.
We are using Hel= ix to manage a MASTER/SLAVE cluster using AUTO mode.
AUTO mode en= able us to place the MASTER and SLAVE to the correct host.
We name the MASTER as Node1 and SLAVE as Node2.=A0

The system processes a high rate of incoming events, thousands per se= cond.
Node1 consumes the events, generate internal state, and the= n replicates event to the Node2.
Node2 will consume events from the Node1 and generates exactly same in= ternal state.

When Node1 fails, we want Node2 to b= ecome new MASTER and process incoming events.
This means we can n= ot restart Node1 since the Node2's state has move beyond the failed MAS= TER.
We keep the failed Node1 down for the rest of the business day.
<= div>Everything works as expected under ideal situation.

The problem we're experiencing with long GCs is that Node1 transi= tions to OFFLINE and then back to MASTER.
This causes the Node1 and Node2 to get out of sync.

Ideally I can find a general solution such that whenever Node2 becom= es MASTER, it modifies the Ideal state so that Node1 can come back as SLAVE= .
This solution will address the Node1 failure issue and think should fi= x the long GC issue too.
Sorry for the long email.

=
--ming=A0



On May 4, 2013, at 10:29 AM, kishore g <g.kishore@gmail.com> wrote:

Hi Ming,

Need some = more details,
1. How long was the GC, what is the session timeout in zk.
<= br>
Behavior you are seeing is expected, what is happening is due= to GC and losing zookeeper session we call the transitions so that partiti= on goes back to OFFLINE state.=A0

What is the behavior you are looking for when there is = GC.

a. You dont want to lose mastership ? or
=
b. Its ok to lose mastership but you dont want to become master again = ?

One question regarding your application, is it possible= your application can recover after long GC pause?

Dont think this is related to HELIX-79, in that case there were consecutiv= e GC's and I think we have a patch for that issue.

Thanks,
Kishore G


On Sat, May 4, 2013 at 6:32 A= M, Ming Fang <mingfang@mac.com> wrote:
We'r= e experiencing a potentially showstopper issue with how Helix is dealing wi= th very long GCs.
Our system is using the Master Slave model.
A simple test when ru= nning just the Master under extreme load, causing seconds of GC.
= Under long GC condition the Master gets transitioned to Slave then to Offli= ne.
After the GC, we get transited back to Slave then to Master.

I found this Jira that may be related=A0HELIX-79.
We're scheduled to go live with our system next week.
Ar= e there any quick workarounds for this problem?



--047d7ba977be69aedf04dbe6eade--