Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D4E1FE4FF for ; Tue, 5 Feb 2013 07:48:45 +0000 (UTC) Received: (qmail 34044 invoked by uid 500); 5 Feb 2013 07:48:45 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 33961 invoked by uid 500); 5 Feb 2013 07:48:44 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 33937 invoked by uid 99); 5 Feb 2013 07:48:44 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Feb 2013 07:48:44 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [98.139.213.74] (HELO nm26-vm0.bullet.mail.bf1.yahoo.com) (98.139.213.74) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Feb 2013 07:48:32 +0000 Received: from [98.139.212.147] by nm26.bullet.mail.bf1.yahoo.com with NNFMP; 05 Feb 2013 07:48:11 -0000 Received: from [98.139.215.228] by tm4.bullet.mail.bf1.yahoo.com with NNFMP; 05 Feb 2013 07:48:11 -0000 Received: from [127.0.0.1] by omp1068.mail.bf1.yahoo.com with NNFMP; 05 Feb 2013 07:48:11 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 515823.15100.bm@omp1068.mail.bf1.yahoo.com Received: (qmail 36393 invoked by uid 60001); 5 Feb 2013 07:48:11 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1360050491; bh=lbzAbZthOp5cIMzQ8paE51/OYMHCRIdbPxX6q4WCeuw=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-RocketYMMF:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=BrfUiFcnCrZlv9XaM+3+otDIoapxNd8n2b7t/as7fQNeme3XjAi4gXaZv8huvVF/Fa18M5Nu99q2iK4XzlGurWmzSnIFg0dIQCaNLoMCk/CSSZkCDLp5UxoNc1du3TLVTigNX6NuACbOPLYjtxDfRPkY0Ni5RPg/JzAgsLAwRgE= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-RocketYMMF:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=uh+SxC9NOs9KLs7ZzcPy4TZKvnxzfvqrO/Abv3n0PwJJ9dAuSe8Okwep7QXktMtVMGQuBTwWY4NazR2Gh0ejYgqXLMXBC8hgz22mrWvY4qHGvbOzfvpwpbV+YjMhGytoXJ1cRcMGG4sHeOfWKYWPYZRbnli9B2ipqpgZBLj2Y3M=; X-YMail-OSG: o5HBBTUVM1nNoYJK8kMSqIwBfG0_d58bH891TG3d_EGFxJe 154eoYxLwit4VcI8x3uhc2GF94vDT0mPaorDOWMxMsASOGPIfRYPINZuDRWq TTQxRaJa.653A6du.34LXjjUBv4XilRTb0aycjIHS2mIHMIPzEgiktokENZw XolO7ctj5OU71JdrRrOggfWhqMeQOQuCy_JOQNVeAfz9SIpmpCQdOK8jQbNU hBzYJHzX7BUbG_qLb7jGMcPhdojBbnfzmVmtNoPSPfKBXvq5UeDUo3f0FLdy Z.gJVISCkyp6buOnxMAqcAPEaKWXRNVSXxOE.DEEX4F.fgC1UkD3ecs.F1m6 AahN1nSnI1fE2pEvCPReBpsSwIhId6pTAFgDo00e4s6.k.6j9YdCXIo20olw d1ItXr_BHIoKgsquH3qv.dmY8_wOfAQaq7WmvcPkdWna8ugWh8vOxB.9XmbH AwzwJtXgWAChDsKjK3Y9Hfxh2o64d37tzdHutp.UTb0CD327C0KOz4Uj2LMq gA0fuW4agWS_XwiQy Received: from [107.3.190.75] by web140602.mail.bf1.yahoo.com via HTTP; Mon, 04 Feb 2013 23:48:11 PST X-Rocket-MIMEInfo: 001.001,QWggeWVhaCwgSSBwdXNoZWQgdGhpcyBmcm9tIDAuOTQuNSB0byAwLjk0LjYgbXlzZWxmIDopwqAgU2VydmVzIG1lIHJpZ2h0LgoKVGhhbmtzLgoKCgpfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwogRnJvbTogRWxsaW90dCBDbGFyayA8ZWNsYXJrQGFwYWNoZS5vcmc.ClRvOiAiZGV2QGhiYXNlLmFwYWNoZS5vcmciIDxkZXZAaGJhc2UuYXBhY2hlLm9yZz47IGxhcnMgaG9maGFuc2wgPGxhcnNoQGFwYWNoZS5vcmc.IApTZW50OiBNb25kYXksIEZlYnJ1YXJ5IDQsIDIwMTMgMTE6MzkgUE0KU3ViamUBMAEBAQE- X-RocketYMMF: lhofhansl X-Mailer: YahooMailWebService/0.8.132.503 References: <1360049545.99284.YahooMailNeo@web140601.mail.bf1.yahoo.com> Message-ID: <1360050491.34854.YahooMailNeo@web140602.mail.bf1.yahoo.com> Date: Mon, 4 Feb 2013 23:48:11 -0800 (PST) From: lars hofhansl Reply-To: lars hofhansl Subject: Re: Fully qualified path names in distributed log splitting. To: "dev@hbase.apache.org" In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="-118416272-827698199-1360050491=:34854" X-Virus-Checked: Checked by ClamAV on apache.org ---118416272-827698199-1360050491=:34854 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Ah yeah, I pushed this from 0.94.5 to 0.94.6 myself :)=A0 Serves me right.= =0A=0AThanks.=0A=0A=0A=0A________________________________=0A From: Elliott = Clark =0ATo: "dev@hbase.apache.org" ; lars hofhansl =0ASent: Monday, February 4, 2013 11:3= 9 PM=0ASubject: Re: Fully qualified path names in distributed log splitting= .=0A =0AHBASE-7723 attem= pts to=0Afix this.=A0 The issue arises when moving from standard nn to HA a= nd back.=0A=0A=0AOn Mon, Feb 4, 2013 at 11:32 PM, lars hofhansl wrote:=0A=0A> We just found ourselves in an interesting pickle.=0A>= =0A> We were upgrading one of our clusters from HBase 0.94.0 on Hadoop 1.0.= 4 to=0A> HBase 0.94.4 on top of Hadoop 2.=0A> The cluster has been setup a = while ago and the old shutdown script had a=0A> bug and shutdown HBase and = HDFS uncleanly.=0A>=0A> Assuming that the log will be replayed we upgraded = Hadoop to 2.0.x, and=0A> verified that from a file system view everything i= s OK.=0A> The new HDFS runs with an HA NameNode, so the FS changed from hdf= s:// host name> to hdfs://=0A>=0A>=0A> Then we bro= ught up HBase and found it stuck in splitting logs forever.=0A> In the log = we see messages like these:=0A> 2013-02-05 06:22:31,045 ERROR=0A> org.apach= e.hadoop.hbase.regionserver.SplitLogWorker: unexpected error=0A> java.lang.= IllegalArgumentException:=0A>=A0 Wrong FS:=0A> hdfs:///.logs/<= rs host>,60020,1358540589323-splitting/ host>%2C60020%2C135854058932= 3.1359962644861,=0A>=A0 expected: hdfs://=0A>=A0 =A0 =A0 = =A0 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:547)=0A>= =A0 =A0 =A0 =A0 at=0A> org.apache.hadoop.hdfs.DistributedFileSystem.getPat= hName(DistributedFileSystem.java:169)=0A>=A0 =A0 =A0 =A0 at=0A> org.apache= .hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java= :783)=0A>=A0 =A0 =A0 =A0 at=0A> org.apache.hadoop.hbase.regionserver.Split= LogWorker$1.exec(SplitLogWorker.java:111)=0A>=A0 =A0 =A0 =A0 at=0A> org.ap= ache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:= 264)=0A>=A0 =A0 =A0 =A0 at=0A> org.apache.hadoop.hbase.regionserver.SplitL= ogWorker.taskLoop(SplitLogWorker.java:195)=0A>=A0 =A0 =A0 =A0 at=0A> org.a= pache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:163)= =0A>=A0 =A0 =A0 =A0 at java.lang.Thread.run(Thread.java:662)=0A>=0A> So it= looks like distributed log splitting stores the full HDFS path name=0A> in= cluding the host, which seems unnecessary.=0A> This path is stored in ZK.= =0A>=0A> So all in all it seems that only can happen if all the following i= s true:=0A> unclean shutdown, keeping the same ZK ensemble, changed FS.=0A>= =0A>=0A> The data is not important, we can just blow it away, but we want t= o prove=0A> that we could recover the data if we had to.=0A> It seems we ha= ve three options:=0A>=0A> 1. Blow away the data in ZK under "splitlog", and= restart HBase. It should=0A> restart the split process with the correct pa= thnames.=0A>=0A> 2. Temporarily change the config for the region server to = set the root dir=0A> to hdfs://, bounce HBase. The log splitti= ng should now be able=0A> to succeed.=0A> 3. Downgrade back to the old Hado= op (we kept a copy of the image).=0A>=0A> We're trying option #2, to see wh= ether that would fix it. #1 should work=0A> too.=0A>=0A>=0A> Has anybody el= se experienced this?=0A> It seems that would also limit our ability to take= a snapshot of a=0A> filesystem and move it to somewhere else, as the hostn= ames are hardcoded,=0A> at least in ZK for log splitting.=0A>=0A>=0A> -- La= rs=0A> ---118416272-827698199-1360050491=:34854--