Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 19A9C107AF for ; Tue, 6 Aug 2013 15:38:38 +0000 (UTC) Received: (qmail 19335 invoked by uid 500); 6 Aug 2013 15:38:35 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 19272 invoked by uid 500); 6 Aug 2013 15:38:35 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 19264 invoked by uid 99); 6 Aug 2013 15:38:35 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Aug 2013 15:38:35 +0000 X-ASF-Spam-Status: No, hits=0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE X-Spam-Check-By: apache.org Received-SPF: error (nike.apache.org: local policy) Received: from [106.10.148.240] (HELO nm11-vm1.bullet.mail.sg3.yahoo.com) (106.10.148.240) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Aug 2013 15:38:27 +0000 Received: from [106.10.166.127] by nm11.bullet.mail.sg3.yahoo.com with NNFMP; 06 Aug 2013 15:37:44 -0000 Received: from [106.10.151.250] by tm16.bullet.mail.sg3.yahoo.com with NNFMP; 06 Aug 2013 15:37:44 -0000 Received: from [127.0.0.1] by omp1021.mail.sg3.yahoo.com with NNFMP; 06 Aug 2013 15:37:44 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 380927.47353.bm@omp1021.mail.sg3.yahoo.com Received: (qmail 9792 invoked by uid 60001); 6 Aug 2013 15:37:44 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.co.in; s=s1024; t=1375803464; bh=wSyRV4PFVzobNbA44cx2uOGaCr3zzTeFZpkFSgLp32Y=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=BqupjR1SuQ8Rz8/s2HPSQxOqeGZg/K5Mz/0amGbcC/ewKLY+usj1kF9XsIPJlhNYhCrbRKj1ut86jjn44OceMoNuXHLsFocws2L+D3dgoyLFG6N8BEJIMAF0cvobjmvcjk4lpZyzZ+gKxxPf8Dv8EGoMwro70chiqXFQJ0cbTr4= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.co.in; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=fLxkkKZoRSi4u8lFBRF96z4IeScCwDvWpiMotocgsqO1Khe5IwyDrZzpHpY3bA9VCHe1EMdqL/pcDuuh1kzIJLvZsaqAwcedE8PEoFctZLKY6ybQuSEOPFgpn7gqhjAa7egMjedj/ipQwEHVJhcB4dmenPflojke2ynaxdg1qhQ=; X-YMail-OSG: 1H3mgjUVM1ms1ElAY.gBdoh4oozIr7o58Z_Re12e0JFKiqj kixsxzNrobOIMO9dvWIcGWdtbx7MFujucpsZMEB30sMjLSSp0rHDVF4QsBSW u07qmEAwl7h9IkOex78szSgwOmxN5DrokczGB1hk_yTqZZuPM3S_zHEdsjMo Oxbnq90po06MbzT37Zit4oTp3naAjl4Yo5Gt9vUKmwLveqAGbJdh6iCKp1.r NFWE4xEgGsOKrAFXraJOaXo.aHVxIRFEE_D_9H7p1ueBp3KRSaMyjVqqpa0w Pcz6qHzYkrwctIHvfAM4gqmG3_aV01t3kJVpv75zlHyNF6I0_Yjt4cczXE4H CsxRHX2iWX0Ok3JdGQKJtBQLjRY1NKF2.hgh9mR_ljBbX80_YilnDEN36vDT 6xD1jqNlLaQVwK6n9OUiHZwscXTVV18comYfPZLIXV_8hKl8l96tqITH8FkL XeOJ5QZc.08zJf_dcfur6hHlaT8zBwZ154of6aJU0_aeI1BohvG_6zkzcbuE GKr7l5rX5PvJs.Vp.ZXZayXaXnBRKQt2SLAFAthpSdv6bfj_2r.El0ylZZug gs_yrCg_JeS8l4Vy6 Received: from [199.172.169.86] by web190106.mail.sg3.yahoo.com via HTTP; Tue, 06 Aug 2013 23:37:44 SGT X-Rocket-MIMEInfo: 002.001,SSBoYXZlIDQvNSBsZWZ0IGluIHRoZSBxdW9ydW0uIFNvIHRoaXMgaXMgbm90IGEgbWFqb3JpdHkgaXNzdWUuCgpBbHNvIGFscmVhZHkgcnVubmluZyBzZXJ2aWNlcyBrZWVwIHJ1bm5pbmcgZmluZSBmb3IgbWFueSBob3VycyAoc28gdGhpcyBpcyBub3QgYW4gaXNzdWUgd2l0aCB0aGUgbmV3IGxlYWRlciBlaXRoZXIpLiBJdCBzZWVtcyBsaWtlIHRoZSBIQmFzZSBjbGllbnQgY29kZSwgd2hlbiB0cnlpbmcgdG8gbG9va3VwIHRoZSB6b29rZWVwZXIgcXVvcnVtIHRvIGNvbm5lY3QgdG8sIGlzIG5vdCBhYmxlIHQBMAEBAQE- X-Mailer: YahooMailWebService/0.8.152.567 References: <1375800527.80621.YahooMailNeo@web190105.mail.sg3.yahoo.com> <1375802331.75017.YahooMailNeo@web190101.mail.sg3.yahoo.com> Message-ID: <1375803464.79956.YahooMailNeo@web190106.mail.sg3.yahoo.com> Date: Tue, 6 Aug 2013 23:37:44 +0800 (SGT) From: Dhaval Shah Reply-To: Dhaval Shah Subject: Re: NoRouteToHostException when zookeeper crashes To: "user@hbase.apache.org" In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org I have 4/5 left in the quorum. So this is not a majority issue.=0A=0AAlso a= lready running services keep running fine for many hours (so this is not an= issue with the new leader either). It seems like the HBase client code, wh= en trying to lookup the zookeeper quorum to connect to, is not able to hand= le the NoRouteToHostException and errors out there itself (does not try ret= rying other zookeeper servers because of the unhandled exception).=0A=A0=0A= Regards,=0ADhaval=0A=0A=0A----- Original Message -----=0AFrom: Ted Yu =0ATo: user@hbase.apache.org; Dhaval Shah =0ACc: =0ASent: Tuesday, 6 August 2013 11:32 AM=0ASubject: Re: = NoRouteToHostException when zookeeper crashes=0A=0Abq. one of my zookeeper = server goes down=0A=0AHow many servers were left in the quorum ? Was the ne= w leader elected=0Aproperly afterwards ?=0A=0AThanks=0A=0AOn Tue, Aug 6, 20= 13 at 8:18 AM, Dhaval Shah wrote:=0A=0A> HBase= - 0.92.1=0A> Zookeeper - 3.4.3=0A>=0A> Regards,=0A> Dhaval=0A>=0A>=0A> ---= -- Original Message -----=0A> From: Ted Yu =0A> To: us= er@hbase.apache.org; Dhaval Shah =0A> Cc:=0A> = Sent: Tuesday, 6 August 2013 11:08 AM=0A> Subject: Re: NoRouteToHostExcepti= on when zookeeper crashes=0A>=0A> What HBase / zookeeper versions are you u= sing ?=0A>=0A> On Tue, Aug 6, 2013 at 7:48 AM, Dhaval Shah >wrote:=0A>=0A> > I have a weird (and a pretty serious) is= sue on my HBase cluster. Whenever=0A> > one of my zookeeper server goes dow= n, already running services work fine=0A> > for a few hours but when I try = to restart any service (be it region=0A> servers=0A> > or clients), they fa= il with a NoRouteToHostException while trying to=0A> > connect to zookeeper= and I cannot restart any service successfully at=0A> all.=0A> > I do reali= ze that No Route to host is coming from my network=0A> infrastructure=0A> >= (ping gives the same error) but why would 1 zookeeper server going down=0A= > > bring down the entire HBase cluster. Why doesn't HBase ride over the=0A= > > exception and try some other zookeeper server?=0A> >=0A> > Is this an i= ssue other people face or its just me? We are running these=0A> on=0A> > DH= CP (but the IPs don't change because we have long leases). Do you guys=0A> = > think its a DHCP specific issue? Do you have pointers to avoid this issue= =0A> > with DHCP or do I have to move to static IPs?=0A> >=0A> > Regards,= =0A> > Dhaval=0A> >=0A>=0A>=0A