Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 26B05EE34 for ; Thu, 14 Mar 2013 04:17:23 +0000 (UTC) Received: (qmail 83766 invoked by uid 500); 14 Mar 2013 04:17:22 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 83360 invoked by uid 500); 14 Mar 2013 04:17:21 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 83334 invoked by uid 99); 14 Mar 2013 04:17:20 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Mar 2013 04:17:20 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [98.139.212.178] (HELO nm19.bullet.mail.bf1.yahoo.com) (98.139.212.178) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 14 Mar 2013 04:17:15 +0000 Received: from [98.139.212.145] by nm19.bullet.mail.bf1.yahoo.com with NNFMP; 14 Mar 2013 04:16:55 -0000 Received: from [98.139.212.196] by tm2.bullet.mail.bf1.yahoo.com with NNFMP; 14 Mar 2013 04:16:55 -0000 Received: from [127.0.0.1] by omp1005.mail.bf1.yahoo.com with NNFMP; 14 Mar 2013 04:16:54 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 996691.22406.bm@omp1005.mail.bf1.yahoo.com Received: (qmail 23039 invoked by uid 60001); 14 Mar 2013 04:16:54 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1363234614; bh=koya+u15EA4ntwrTztcmqmylicSYVeRgVGeEJvbanXs=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-RocketYMMF:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=BxwrPNXfUxANO5wdlFd2Irlu5dS+A8B/2JDd+dyR3dgA4a4QhkZapTb48PwDcTTjf67/G43ghchhYvDame4hSFUneng6RAHyuz5MsKdZDNdv5NzW1NTGe9bKRCBO0DdKh0jQL4BOamYh2LKaiGHjIXLqjGbSrEYZiZlLionS0e8= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-RocketYMMF:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=H22aJdCW7aNacy/vdaT6q7ZFX2tY7WfqSrGixe0q656/wX6YeDXZ8RmYgmE8HkD5PcGmq3PicsO5BmU9GM0k1Y8On2ZlNf+1gHy8sq0M4Y9ksBjS3buE9MPe/UUyCOHsEydFihXxhdyBbF2qFfsw3zZDq/n72b+N4wW4O/gmtTI=; X-YMail-OSG: IeF6RQwVM1naMocRrYaJqhcdOWRetq67eihHLGl3qRgMQll r1LMocNF98gAQNrlaZxihCUJSlIUtW_INPwBG.OXv7M6Y0i285YJ792Oy_UX j1NAzfDOQI6STmdfMnBD8th1N4N_SjeXTKgdO05CfQe13NH6YETGxHhmmUEU 6QVh5awddWg9JA6ZDt2C_7zYwn3DxP2F.ecuNRfFqHls4v0qdtFKFsw9IkJp qGxP2DaHY.pOsbQc5mpPxZJnmnnPXQqWQapDmGIrlInVW5ievA.0adEoSB35 SAtYoc4KZExXldNFxRkLLNOov1UiCPefmtzmc0pT4YkD36ivIWIjjXZ756.z QHj6euWTjyt0LzUrrnf741bbb.XgwNnzRLRz7D7YlBUrdDp1RqUdiM7673zM iKVN9jgsf.zzAB2G2Y7_oMvs50InwXKEXRpbtgexh6ZE9xn1KadntJmOpMyl 0tkygyYF8Ke_4KA79ZT_uEJYkK7b_IyWIY9Y14tESYSOh9LnCOaKiJr46cfM Z8DktWXrGizDe3fJIxb.dp4cv1o1EOJbm3cl7aeTKKuHX4dLYOrpW2jXaqWG HDx5xXejcyJmCZ2NkRPHSkLFcdV8UI5O9HJ2TuPMRwvqKpRd.EaN1LJlqVq3 3Jl3pcmfJxmrbxwQcnZiFVTHRI8svBoox Received: from [24.130.114.129] by web140602.mail.bf1.yahoo.com via HTTP; Wed, 13 Mar 2013 21:16:54 PDT X-Rocket-MIMEInfo: 002.001,SSBoYXZlIHByb3Bvc2VkIHNvbWUgbWlub3IgY2hhbmdlcyAoaW5jbHVkaW5nIGFkZGluZyB0aGUgaml0dGVyKSBvbiBIQkFTRS04MDk5LgpUdXJucyBvdXQgdGhlcmUgYWxyZWFkeSBpcyBhIHdhaXQtdGltZSB0byBnaXZlIHRoZSBjbHVzdGVyIGEgY2hhbmNlIHRvIHNodXRkb3duLiBJdCBkZWZhdWx0cyB0byAycywgd2hpY2ggd2FzIG5vdCBlbm91Z2ggaW4gb3VyIGNhc2UuCgpMZXQncyBkbyBhIHRlc3QgKGlmIHdlIHRoaW5rIHRoYXQgY2FuIGJlIGRvbmUpIGluIGEgZGlmZmVyZW50IGppcmEuCgoKLS0gTGEBMAEBAQE- X-RocketYMMF: lhofhansl X-Mailer: YahooMailWebService/0.8.137.519 References: <1363223561.19602.YahooMailNeo@web140606.mail.bf1.yahoo.com> <1363224475.25762.YahooMailNeo@web140604.mail.bf1.yahoo.com> <1363225503.81869.YahooMailNeo@web140606.mail.bf1.yahoo.com> <1363232919.19485.YahooMailNeo@web140606.mail.bf1.yahoo.com> Message-ID: <1363234614.19313.YahooMailNeo@web140602.mail.bf1.yahoo.com> Date: Wed, 13 Mar 2013 21:16:54 -0700 (PDT) From: lars hofhansl Reply-To: lars hofhansl Subject: Re: Replication hosed after simple cluster restart To: Himanshu Vashishtha , "dev@hbase.apache.org" In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org I have proposed some minor changes (including adding the jitter) on HBASE-8= 099.=0ATurns out there already is a wait-time to give the cluster a chance = to shutdown. It defaults to 2s, which was not enough in our case.=0A=0ALet'= s do a test (if we think that can be done) in a different jira.=0A=0A=0A-- = Lars=0A________________________________=0AFrom: Himanshu Vashishtha =0ATo: dev@hbase.apache.org; lars hofhansl =0ASent: Wednesday, March 13, 2013 8:59 PM=0ASubject: Re: Replication = hosed after simple cluster restart=0A=0AOn Wed, Mar 13, 2013 at 8:48 PM, la= rs hofhansl wrote:=0A> Yeah, lemme sink the RC... We do = have a fix.=0A>=0A>=0A> Consider it sunk.=0A>=0A> In the end there are some= more issues to discuss anyway.=0A> - Can we avoid RSs taking over queues d= uring a clean shutdown/restart? Without multi we can actually loose data to= replicate this way (one RS is shut down, another takes over and is itself = shut down) - unless I misunderstand.=0A=0AI agree. because even if they do = move, they are not using locality as=0Athe regionserver which eventually ta= kes it over will remotely read the=0Alog files. One way I can think of is d= o a scan on the available=0Aregionservers in the /hbase/rs znodes and then = decide whether it=0Ashould start the failover processing.=0A=0A>=0A> - Shou= ld we stagger the attempts to move the queues for example with a random wai= t between 0 and 10s, so that not all RSs try at the same time?=0A> - A test= for this scenario? (That's probably tricky)=0A=0AHow about adding a jitter= (random sleep (0-10 sec]) in the run method=0Aof the NodeFailoverWorker be= fore it actually starts the failover=0Aprocessing? I will try to come up wi= th a test case.=0A=0A>=0A>=0A> -- Lars=0A>=0A>=0A>=0A> ____________________= ____________=0A>=A0 From: Andrew Purtell =0A> To: "dev= @hbase.apache.org" =0A> Sent: Wednesday, March 13, 20= 13 8:22 PM=0A> Subject: Re: Replication hosed after simple cluster restart= =0A>=0A> If Himanshu (?) can fix it quickly we should try to get it in here= IMHO.=0A>=0A> On Wednesday, March 13, 2013, Ted Yu wrote:=0A>=0A>> This wa= s the JIRA that introduced copyQueuesFromRSUsingMulti():=0A>> HBASE-2611 Ha= ndle RS that fails while processing the failure of another one=0A>> (Himans= hu Vashishtha)=0A>>=0A>> It went into 0.94.5=0A>> And the feature is off by= default:=0A>>=0A>>=A0 =A0=A0=A0hbase.zookeeper.useMulti=0A>>= =A0 =A0=A0=A0false=0A>>=0A>> The fact that Lars first report= ed the following problem meant that no other=0A>> user tried this feature.= =0A>>=0A>> Hence I think 0.94.6 RC1 doesn't need to be sunk.=0A>>=0A>> Chee= rs=0A>>=0A>> On Wed, Mar 13, 2013 at 6:45 PM, lars hofhansl >=0A>> wrote:=0A>>=0A>> > Hey no problem. It's cool that we= found it in a test env. It's probably=0A>> > quite hard to reproduce.=0A>>= > This is in 0.94.5 but this feature is off by default.=0A>> >=0A>> > What= 's the general thought here, should I kill the current 0.94.6 rc for=0A>> >= this?=0A>> > My gut says: Yes.=0A>> >=0A>> >=0A>> > I'm also a bit worried= about these:=0A>> > 2013-03-14 01:42:42,271 DEBUG=0A>> > org.apache.hadoop= .hbase.replication.regionserver.ReplicationSource:=0A>> Opening=0A>> > log = for replication shared-dnds1-12-sfm.ops.sfdc.net=0A>> %2C60020%2C1363220608= 780.1363220609572=0A>> > at 0=0A>> > 2013-03-14 01:42:42,358 WARN=0A>> > or= g.apache.hadoop.hbase.replication.regionserver.ReplicationSource: 1=0A>> Go= t:=0A>> > java.io.EOFException=0A>> >=A0 =A0 =A0 =A0=A0=A0at java.io.DataIn= putStream.readFully(DataInputStream.java:180)=0A>> >=A0 =A0 =A0 =A0=A0=A0at= java.io.DataInputStream.readFully(DataInputStream.java:152)=0A>> >=A0 =A0 = =A0 =A0=A0=A0at=0A>> > org.apache.hadoop.io.SequenceFile$Reader.init(Sequen= ceFile.java:1800)=0A>> >=A0 =A0 =A0 =A0=A0=A0at=0A>> >=0A>> org.apache.hado= op.io.SequenceFile$Reader.initialize(SequenceFile.java:1765)=0A>> >=A0 =A0 = =A0 =A0=A0=A0at=0A>> > org.apache.hadoop.io.SequenceFile$Reader.(Sequ= enceFile.java:1714)=0A>> >=A0 =A0 =A0 =A0=A0=A0at=0A>> > org.apache.hadoop.= io.SequenceFile$Reader.(SequenceFile.java:1728)=0A>> >=A0 =A0 =A0 =A0= =A0=A0at=0A>> >=0A>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileL= ogReader$WALReader.(SequenceFileLogReader.java:55)=0A>> >=A0 =A0 =A0 = =A0=A0=A0at=0A>> >=0A>> org.apache.hadoop.hbase.regionserver.wal.SequenceFi= leLogReader.init(SequenceFileLogReader.java:177)=0A>> >=A0 =A0 =A0 =A0=A0= =A0at=0A>> > org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.j= ava:728)=0A>> >=A0 =A0 =A0 =A0=A0=A0at=0A>> >=0A>> org.apache.hadoop.hbase.= replication.regionserver.ReplicationHLogReaderManager.openReader(Replicatio= nHLogReaderManager.java:67)=0A>> >=A0 =A0 =A0 =A0=A0=A0at=0A>> >=0A>> org.a= pache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(Re= plicationSource.java:507)=0A>> >=A0 =A0 =A0 =A0=A0=A0at=0A>> >=0A>> org.apa= che.hadoop.hbase.replication.regionserver.ReplicationSource.run(Replication= Source.java:313)=0A>> > 2013-03-14 01:42:42,358 WARN=0A>> > org.apache.hado= op.hbase.replication.regionserver.ReplicationSource:=0A>> Waited=0A>> > too= long for this file, considering dumping=0A>> > 2013-03-14 01:42:42,358 DEB= UG=0A>> > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourc= e:=0A>> Unable=0A>> > to open a reader, sleeping 1000 times 10=0A>> >=0A>> = > This happens after bouncing the cluster a 2nd time and these messages=0A>= > > repeat every 10s (for hours now). This is a separate problem I think.= =0A>> >=0A>> > -- Lars=0A>> >=0A>> >=A0=A0=A0------------------------------= =0A>> > *From:* Himanshu Vashishtha = >=0A>> >=0A>> > *To:* dev@hbase.apache.org ; lars hofhansl <= =0A>> larsh@apache.org >=0A>> > *Cc:* Ted Yu >=0A>> > *Sent:* Wednesday, March 13, 2013 6:38 PM=0A>> = >=0A>> > *Subject:* Re: Replication hosed after simple cluster restart=0A>>= >=0A>> > This is bad. Yes, copyQueuesFromRSUsingMulti returns a list which= it=0A>> > might not be able to move later on, resulting in bogus znodes.= =0A>> > I'll fix this asap. Weird it didn't happen in my testing earlier.= =0A>> > Sorry about this.=0A>> >=0A>> >=0A>> > On Wed, Mar 13, 2013 at 6:27= PM, lars hofhansl >=0A>> wrote:=0A>> > > So= rry 0.94.6RC1=0A>> > > (I complain about folks not reporting the version al= l the time, and=0A>> then=0A>> > I do it too)=0A>> > >=0A>> > >=0A>> > >=0A= >> > > ________________________________=0A>> > >=A0 From: Ted Yu >=0A>> > > To: dev@hbase.apache.org = ; lars hofhansl <=0A>> larsh@apache.org >=0A>> > > Sent: Wedn= esday, March 13, 2013 6:17 PM=0A>> > > Subject: Re: Replication hosed after= simple cluster restart=0A>> > >=0A>> > >=0A>> > > Did this happen on 0.94.= 5 ?=0A>> > >=0A>> > > Thanks=0A>> > >=0A>> > >=0A>> > > On Wed, Mar 13, 201= 3 at 6:12 PM, lars hofhansl >=0A>> wrote:=0A= >> > >=0A>> > > We just ran into an interesting scenario. We restarted a cl= uster that=0A>> > was setup as a replication source.=0A>> > >>The stop went= cleanly.=0A>> > >>=0A>> > >>Upon restart *all* regionservers aborted withi= n a few seconds with=0A>> > variations of these errors:=0A>> > >>http://pas= tebin.com/3iQVuBqS=0A>> > >>=0A>> > >>This is scary!=0A>> > >>=0A>> > >>-- = Lars=0A>> >=0A>> >=0A>> >=0A>>=0A>=0A>=0A> --=0A> Best regards,=0A>=0A>=A0 = =A0 - Andy=0A>=0A> Problems worthy of attack prove their worth by hitting b= ack. - Piet Hein=0A> (via Tom White)