Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hbase.apache.org
Received-SPF: pass (athena.apache.org: local policy)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
  s=s1024; d=yahoo.com;
  h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-RocketYMMF:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding;
  b=H22aJdCW7aNacy/vdaT6q7ZFX2tY7WfqSrGixe0q656/wX6YeDXZ8RmYgmE8HkD5PcGmq3PicsO5BmU9GM0k1Y8On2ZlNf+1gHy8sq0M4Y9ksBjS3buE9MPe/UUyCOHsEydFihXxhdyBbF2qFfsw3zZDq/n72b+N4wW4O/gmtTI=;
References: <1363223561.19602.YahooMailNeo@web140606.mail.bf1.yahoo.com>
 <CALte62xeoqjoaCRoB=eLqEt+LrO6Vb+8CP=cKpd7GLirGNR7fg@mail.gmail.com>
 <1363224475.25762.YahooMailNeo@web140604.mail.bf1.yahoo.com>
 <CAFLnt_o69XtDVWthRBWraszaB7fCTFdq+jSe2em9VFsn+fPbRA@mail.gmail.com>
 <1363225503.81869.YahooMailNeo@web140606.mail.bf1.yahoo.com>
 <CALte62zR3B2wuY3K4UH-1jLpk22Q_1Gi-WiypbNAByydnoLgpg@mail.gmail.com>
 <CA+RK=_B_DhCcnrwE+tJNfDRjZLS34y0TMBFDvCf2uaZm-V2qiA@mail.gmail.com>
 <1363232919.19485.YahooMailNeo@web140606.mail.bf1.yahoo.com>
 <CAFLnt_rsk=S1cJ4CmqVHyV2h+QieMMRf3ubhv7JCn_gSmY2N0w@mail.gmail.com>
Message-ID: <1363234614.19313.YahooMailNeo@web140602.mail.bf1.yahoo.com>
Date: Wed, 13 Mar 2013 21:16:54 -0700 (PDT)
From: lars hofhansl <larsh@apache.org>
Reply-To: lars hofhansl <larsh@apache.org>
Subject: Re: Replication hosed after simple cluster restart
To: Himanshu Vashishtha <hvashish@cs.ualberta.ca>,
  "dev@hbase.apache.org" <dev@hbase.apache.org>
In-Reply-To: 
 <CAFLnt_rsk=S1cJ4CmqVHyV2h+QieMMRf3ubhv7JCn_gSmY2N0w@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable

I have proposed some minor changes (including adding the jitter) on HBASE-8=
099.=0ATurns out there already is a wait-time to give the cluster a chance =
to shutdown. It defaults to 2s, which was not enough in our case.=0A=0ALet'=
s do a test (if we think that can be done) in a different jira.=0A=0A=0A-- =
Lars=0A________________________________=0AFrom: Himanshu Vashishtha <hvashi=
sh@cs.ualberta.ca>=0ATo: dev@hbase.apache.org; lars hofhansl <larsh@apache.=
org> =0ASent: Wednesday, March 13, 2013 8:59 PM=0ASubject: Re: Replication =
hosed after simple cluster restart=0A=0AOn Wed, Mar 13, 2013 at 8:48 PM, la=
rs hofhansl <larsh@apache.org> wrote:=0A> Yeah, lemme sink the RC... We do =
have a fix.=0A>=0A>=0A> Consider it sunk.=0A>=0A> In the end there are some=
 more issues to discuss anyway.=0A> - Can we avoid RSs taking over queues d=
uring a clean shutdown/restart? Without multi we can actually loose data to=
 replicate this way (one RS is shut down, another takes over and is itself =
shut down) - unless I misunderstand.=0A=0AI agree. because even if they do =
move, they are not using locality as=0Athe regionserver which eventually ta=
kes it over will remotely read the=0Alog files. One way I can think of is d=
o a scan on the available=0Aregionservers in the /hbase/rs znodes and then =
decide whether it=0Ashould start the failover processing.=0A=0A>=0A> - Shou=
ld we stagger the attempts to move the queues for example with a random wai=
t between 0 and 10s, so that not all RSs try at the same time?=0A> - A test=
 for this scenario? (That's probably tricky)=0A=0AHow about adding a jitter=
 (random sleep (0-10 sec]) in the run method=0Aof the NodeFailoverWorker be=
fore it actually starts the failover=0Aprocessing? I will try to come up wi=
th a test case.=0A=0A>=0A>=0A> -- Lars=0A>=0A>=0A>=0A> ____________________=
____________=0A>=A0 From: Andrew Purtell <apurtell@apache.org>=0A> To: "dev=
@hbase.apache.org" <dev@hbase.apache.org>=0A> Sent: Wednesday, March 13, 20=
13 8:22 PM=0A> Subject: Re: Replication hosed after simple cluster restart=
=0A>=0A> If Himanshu (?) can fix it quickly we should try to get it in here=
 IMHO.=0A>=0A> On Wednesday, March 13, 2013, Ted Yu wrote:=0A>=0A>> This wa=
s the JIRA that introduced copyQueuesFromRSUsingMulti():=0A>> HBASE-2611 Ha=
ndle RS that fails while processing the failure of another one=0A>> (Himans=
hu Vashishtha)=0A>>=0A>> It went into 0.94.5=0A>> And the feature is off by=
 default:=0A>>=0A>>=A0 =A0=A0=A0<name>hbase.zookeeper.useMulti</name>=0A>>=
=A0 =A0=A0=A0<value>false</value>=0A>>=0A>> The fact that Lars first report=
ed the following problem meant that no other=0A>> user tried this feature.=
=0A>>=0A>> Hence I think 0.94.6 RC1 doesn't need to be sunk.=0A>>=0A>> Chee=
rs=0A>>=0A>> On Wed, Mar 13, 2013 at 6:45 PM, lars hofhansl <larsh@apache.o=
rg<javascript:;>>=0A>> wrote:=0A>>=0A>> > Hey no problem. It's cool that we=
 found it in a test env. It's probably=0A>> > quite hard to reproduce.=0A>>=
 > This is in 0.94.5 but this feature is off by default.=0A>> >=0A>> > What=
's the general thought here, should I kill the current 0.94.6 rc for=0A>> >=
 this?=0A>> > My gut says: Yes.=0A>> >=0A>> >=0A>> > I'm also a bit worried=
 about these:=0A>> > 2013-03-14 01:42:42,271 DEBUG=0A>> > org.apache.hadoop=
.hbase.replication.regionserver.ReplicationSource:=0A>> Opening=0A>> > log =
for replication shared-dnds1-12-sfm.ops.sfdc.net=0A>> %2C60020%2C1363220608=
780.1363220609572=0A>> > at 0=0A>> > 2013-03-14 01:42:42,358 WARN=0A>> > or=
g.apache.hadoop.hbase.replication.regionserver.ReplicationSource: 1=0A>> Go=
t:=0A>> > java.io.EOFException=0A>> >=A0 =A0 =A0 =A0=A0=A0at java.io.DataIn=
putStream.readFully(DataInputStream.java:180)=0A>> >=A0 =A0 =A0 =A0=A0=A0at=
 java.io.DataInputStream.readFully(DataInputStream.java:152)=0A>> >=A0 =A0 =
=A0 =A0=A0=A0at=0A>> > org.apache.hadoop.io.SequenceFile$Reader.init(Sequen=
ceFile.java:1800)=0A>> >=A0 =A0 =A0 =A0=A0=A0at=0A>> >=0A>> org.apache.hado=
op.io.SequenceFile$Reader.initialize(SequenceFile.java:1765)=0A>> >=A0 =A0 =
=A0 =A0=A0=A0at=0A>> > org.apache.hadoop.io.SequenceFile$Reader.<init>(Sequ=
enceFile.java:1714)=0A>> >=A0 =A0 =A0 =A0=A0=A0at=0A>> > org.apache.hadoop.=
io.SequenceFile$Reader.<init>(SequenceFile.java:1728)=0A>> >=A0 =A0 =A0 =A0=
=A0=A0at=0A>> >=0A>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileL=
ogReader$WALReader.<init>(SequenceFileLogReader.java:55)=0A>> >=A0 =A0 =A0 =
=A0=A0=A0at=0A>> >=0A>> org.apache.hadoop.hbase.regionserver.wal.SequenceFi=
leLogReader.init(SequenceFileLogReader.java:177)=0A>> >=A0 =A0 =A0 =A0=A0=
=A0at=0A>> > org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.j=
ava:728)=0A>> >=A0 =A0 =A0 =A0=A0=A0at=0A>> >=0A>> org.apache.hadoop.hbase.=
replication.regionserver.ReplicationHLogReaderManager.openReader(Replicatio=
nHLogReaderManager.java:67)=0A>> >=A0 =A0 =A0 =A0=A0=A0at=0A>> >=0A>> org.a=
pache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(Re=
plicationSource.java:507)=0A>> >=A0 =A0 =A0 =A0=A0=A0at=0A>> >=0A>> org.apa=
che.hadoop.hbase.replication.regionserver.ReplicationSource.run(Replication=
Source.java:313)=0A>> > 2013-03-14 01:42:42,358 WARN=0A>> > org.apache.hado=
op.hbase.replication.regionserver.ReplicationSource:=0A>> Waited=0A>> > too=
 long for this file, considering dumping=0A>> > 2013-03-14 01:42:42,358 DEB=
UG=0A>> > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourc=
e:=0A>> Unable=0A>> > to open a reader, sleeping 1000 times 10=0A>> >=0A>> =
> This happens after bouncing the cluster a 2nd time and these messages=0A>=
> > repeat every 10s (for hours now). This is a separate problem I think.=
=0A>> >=0A>> > -- Lars=0A>> >=0A>> >=A0=A0=A0------------------------------=
=0A>> > *From:* Himanshu Vashishtha <hvashish@cs.ualberta.ca <javascript:;>=
>=0A>> >=0A>> > *To:* dev@hbase.apache.org <javascript:;>; lars hofhansl <=
=0A>> larsh@apache.org <javascript:;>>=0A>> > *Cc:* Ted Yu <yuzhihong@gmail=
.com <javascript:;>>=0A>> > *Sent:* Wednesday, March 13, 2013 6:38 PM=0A>> =
>=0A>> > *Subject:* Re: Replication hosed after simple cluster restart=0A>>=
 >=0A>> > This is bad. Yes, copyQueuesFromRSUsingMulti returns a list which=
 it=0A>> > might not be able to move later on, resulting in bogus znodes.=
=0A>> > I'll fix this asap. Weird it didn't happen in my testing earlier.=
=0A>> > Sorry about this.=0A>> >=0A>> >=0A>> > On Wed, Mar 13, 2013 at 6:27=
 PM, lars hofhansl <larsh@apache.org<javascript:;>>=0A>> wrote:=0A>> > > So=
rry 0.94.6RC1=0A>> > > (I complain about folks not reporting the version al=
l the time, and=0A>> then=0A>> > I do it too)=0A>> > >=0A>> > >=0A>> > >=0A=
>> > > ________________________________=0A>> > >=A0 From: Ted Yu <yuzhihong=
@gmail.com <javascript:;>>=0A>> > > To: dev@hbase.apache.org <javascript:;>=
; lars hofhansl <=0A>> larsh@apache.org <javascript:;>>=0A>> > > Sent: Wedn=
esday, March 13, 2013 6:17 PM=0A>> > > Subject: Re: Replication hosed after=
 simple cluster restart=0A>> > >=0A>> > >=0A>> > > Did this happen on 0.94.=
5 ?=0A>> > >=0A>> > > Thanks=0A>> > >=0A>> > >=0A>> > > On Wed, Mar 13, 201=
3 at 6:12 PM, lars hofhansl <larsh@apache.org<javascript:;>>=0A>> wrote:=0A=
>> > >=0A>> > > We just ran into an interesting scenario. We restarted a cl=
uster that=0A>> > was setup as a replication source.=0A>> > >>The stop went=
 cleanly.=0A>> > >>=0A>> > >>Upon restart *all* regionservers aborted withi=
n a few seconds with=0A>> > variations of these errors:=0A>> > >>http://pas=
tebin.com/3iQVuBqS=0A>> > >>=0A>> > >>This is scary!=0A>> > >>=0A>> > >>-- =
Lars=0A>> >=0A>> >=0A>> >=0A>>=0A>=0A>=0A> --=0A> Best regards,=0A>=0A>=A0 =
=A0 - Andy=0A>=0A> Problems worthy of attack prove their worth by hitting b=
ack. - Piet Hein=0A> (via Tom White)