Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5EA0817B8E for ; Thu, 19 Mar 2015 23:35:35 +0000 (UTC) Received: (qmail 82387 invoked by uid 500); 19 Mar 2015 23:35:27 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 82314 invoked by uid 500); 19 Mar 2015 23:35:27 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 82302 invoked by uid 99); 19 Mar 2015 23:35:26 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Mar 2015 23:35:26 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: error (nike.apache.org: local policy) Received: from [209.85.218.41] (HELO mail-oi0-f41.google.com) (209.85.218.41) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Mar 2015 23:35:00 +0000 Received: by oier21 with SMTP id r21so78521392oie.1 for ; Thu, 19 Mar 2015 16:33:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=PoyiDuJGKHXwWLHhcEmKr2jG5ltoXKwWV6vmgSRhB1k=; b=QaQSuBion0c9qt0KE9GBXe9jUPnS67PujY5DmXTfuFAKxRQuYW5qwWJLChN0E0xSNE Oa2+Vg0PuVz0tMn8MWB0jUNaQqYioHvrAi5kj+vuOBolTShm//8Nf1B8iWW+EQm4kEEd I3tUDQNGW8cAbYHz0wzitlu8rBKPoeQZvu5ggfkqxUaNzdjHfw6l4HR4tL+i9TOuWIdj 7HNQxAL9xsQCkxQ7E0Wech5ihSM0Xwfy9IYOnGvo6yOhKHgtiG6yzDDZ2QO3A77XMvpR zH95RHANvp666iwXq3FhwzOmaSGMta8iFHjCT/VEzBulIcTvF29e9QiZ9xRFRWyj/QZs By8A== X-Gm-Message-State: ALoCoQlEeqWvkmB5awRRI4yzm6IOHpkyMiUFE0JLT49SksxnNScECHyR8JEfaycnrWc4b6RiU6uT MIME-Version: 1.0 X-Received: by 10.202.87.215 with SMTP id l206mr37960712oib.84.1426808033000; Thu, 19 Mar 2015 16:33:53 -0700 (PDT) Received: by 10.60.57.6 with HTTP; Thu, 19 Mar 2015 16:33:52 -0700 (PDT) In-Reply-To: References: Date: Thu, 19 Mar 2015 16:33:52 -0700 Message-ID: Subject: Re: Recovering from corrupt blocks in HFile From: Mike Dillon To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=001a113ac6a4f8536c0511aca13d X-Virus-Checked: Checked by ClamAV on apache.org --001a113ac6a4f8536c0511aca13d Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Thank you! On Thu, Mar 19, 2015 at 1:48 PM, Jerry He wrote: > It is ok to delete the hfile in question with hadoop file system command. > No restart of hbase is needed. You may see some error exceptions if ther= e > are things (user scan, compaction) on the fly. But it will be ok. > > Jerry > > On Thu, Mar 19, 2015 at 12:27 PM, Mike Dillon > wrote: > > > So, it turns out that the client has an archived data source that can > > recreate the HBase data in question if needed, so the need for me to > > actually recover this HFile has diminished to the point where it's > probably > > not worth investing my time in creating a custom tool to extract the > data. > > > > Given that they're willing to lose the data in this region and recreate > it > > if necessary, do I simply need to delete the HFile to make HDFS happy o= r > is > > there something I need to do at the HBase level to tell it that data wi= ll > > be going away? > > > > Thanks so much everyone for your help on this issue! > > > > -md > > > > On Wed, Mar 18, 2015 at 10:46 PM, Jerry He wrote: > > > > > From HBase perspective, since we don't have a ready tool, the general > > idea > > > will need you to have access to HBase source code and write your own > > tool. > > > On the high level, the tool will read/scan the KVs from the hfile > similar > > > to what the HFile tool does, while opening a HFileWriter to dump the > good > > > data until you are not able to do so. > > > Then you will close the HFileWriter with the necessary meta file info= . > > > There are APIs in HBase to do so, but they may not be external public > > API. > > > > > > Jerry > > > > > > On Wed, Mar 18, 2015 at 4:27 PM, Mike Dillon > > > > wrote: > > > > > > > I've had a chance to try out Stack's passed along suggestion of > > > > HADOOP_ROOT_LOGGER=3D"TRACE,console" hdfs dfs -cat and managed to = get > > > this: > > > > https://gist.github.com/md5/d42e97ab7a0bd656f09a > > > > > > > > After knowing what to look for, I was able to find the same checksu= m > > > > failures in the logs during the major compaction failures. > > > > > > > > I'm willing to accept that all the data after that point in the > corrupt > > > > block is lost, so any specific advice for how to replace that block > > with > > > a > > > > partial one containing only the good data would be appreciated. I'm > > aware > > > > that there may be other checksum failures in the subsequent blocks = as > > > well, > > > > since nothing is currently able to read past the first corruption > > point, > > > > but I'll just have to wash, rinse, and repeat to see how much good > data > > > is > > > > left is the file as a whole. > > > > > > > > -md > > > > > > > > On Wed, Mar 18, 2015 at 2:41 PM, Jerry He > wrote: > > > > > > > > > For a 'fix' and 'recover' hfile tool at HBase level, the > relatively > > > easy > > > > > thing we can recover is probably the data (KVs) up to the point > when > > we > > > > hit > > > > > the first corruption caused exception. > > > > > After that, it will not be as easy. For example, if the current > key > > > > length > > > > > or value length is bad, there is no way to skip to the next KV. = We > > > will > > > > > probably need to skip the whole current hblock, and go to the nex= t > > > block > > > > > for KVs assuming the hblock index is still good. > > > > > > > > > > HBASE-12949 > does > > > an > > > > > incremental improvement to make sure we do get a corruption cause= d > > > > > exception so that the scan/read will not go into an infinite loop= . > > > > > > > > > > Jerry > > > > > > > > > > On Wed, Mar 18, 2015 at 12:03 PM, Mike Dillon < > > > mike.dillon@synctree.com> > > > > > wrote: > > > > > > > > > > > I haven't filed one myself, but I can do so if my investigation > > ends > > > up > > > > > > finding something bug-worthy as opposed to just random failures > due > > > to > > > > > > out-of-disk scenarios. > > > > > > > > > > > > Unfortunately, I had to prioritize some other work this morning= , > > so I > > > > > > haven't made it back to the bad node yet. > > > > > > > > > > > > I did attempt restarting the datanode to see if I could make > hadoop > > > > fsck > > > > > > happy, but that didn't have any noticeable effect. I'm hoping t= o > > have > > > > > more > > > > > > time this afternoon to investigate the other suggestions from > this > > > > > thread. > > > > > > > > > > > > -md > > > > > > > > > > > > On Wed, Mar 18, 2015 at 11:41 AM, Andrew Purtell < > > > apurtell@apache.org> > > > > > > wrote: > > > > > > > > > > > > > =E2=80=8B > > > > > > > On Tue, Mar 17, 2015 at 9:47 PM, Stack > wrote: > > > > > > > > > > > > > > > > > If it's possible to recover all of the file except > > > > > > > > > a portion of the affected block, that would be OK too. > > > > > > > > > > > > > > > > I actually do not see a 'fix' or 'recover' on the hfile too= l. > > We > > > > need > > > > > > to > > > > > > > > add it so you can recover all but the bad block (we should > > figure > > > > how > > > > > > to > > > > > > > > skip the bad section also). > > > > > > > > > > > > > > > > > > > > > =E2=80=8BI was just getting caught up on this thread and had = the same > > > > thought. > > > > > Is > > > > > > > there an issue filed for this? > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 17, 2015 at 9:47 PM, Stack > wrote: > > > > > > > > > > > > > > > On Tue, Mar 17, 2015 at 5:04 PM, Mike Dillon < > > > > > mike.dillon@synctree.com > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi all- > > > > > > > > > > > > > > > > > > I've got an HFile that's reporting a corrupt block in > "hadoop > > > > fsck" > > > > > > and > > > > > > > > was > > > > > > > > > hoping to get some advice on recovering as much data as > > > possible. > > > > > > > > > > > > > > > > > > When I examined the blk-* file on the three data nodes th= at > > > have > > > > a > > > > > > > > replica > > > > > > > > > of the affected block, I saw that the replicas on two of > the > > > > > > datanodes > > > > > > > > had > > > > > > > > > the same SHA-1 checksum and that the replica on the other > > > > datanode > > > > > > was > > > > > > > a > > > > > > > > > truncated version of the replica found on the other nodes > (as > > > > > > reported > > > > > > > > by a > > > > > > > > > difference at EOF by "cmp"). The size of the two identica= l > > > blocks > > > > > is > > > > > > > > > 67108864, the same as most of the other blocks in the fil= e. > > > > > > > > > > > > > > > > > > Given that there were two datanodes with the same data an= d > > > > another > > > > > > with > > > > > > > > > truncated data, I made a backup of the truncated file and > > > dropped > > > > > the > > > > > > > > > full-length copy of the block in its place directly on th= e > > data > > > > > > mount, > > > > > > > > > hoping that this would cause HDFS to no longer report the > > file > > > as > > > > > > > > corrupt. > > > > > > > > > Unfortunately, this didn't seem to have any effect. > > > > > > > > > > > > > > > > > > > > > > > > > > That seems like a reasonable thing to do. > > > > > > > > > > > > > > > > Did you restart the DN that was serving this block before y= ou > > ran > > > > > fsck? > > > > > > > > (Fsck asks namenode what blocks are bad; it likely is still > > > > reporting > > > > > > off > > > > > > > > old info). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Looking through the Hadoop source code, it looks like the= re > > is > > > a > > > > > > > > > CorruptReplicasMap internally that tracks which nodes hav= e > > > > > "corrupt" > > > > > > > > copies > > > > > > > > > of a block. In HDFS-6663 < > > > > > > > > https://issues.apache.org/jira/browse/HDFS-6663 > > > > > > > > > >, > > > > > > > > > a "-blockId" parameter was added to "hadoop fsck" to allo= w > > > > dumping > > > > > > the > > > > > > > > > reason that a block ids is considered corrupt, but that > > wasn't > > > > > added > > > > > > > > until > > > > > > > > > Hadoop 2.7.0 and our client is running 2.0.0-cdh4.6.0. > > > > > > > > > > > > > > > > > > > > > > > > > > Good digging. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I also had a look at running the "HFile" tool on the > affected > > > > file > > > > > > (cf. > > > > > > > > > section 9.7.5.2.2 at > > > > > > > http://hbase.apache.org/0.94/book/regions.arch.html > > > > > > > > ). > > > > > > > > > When I did that, I was able to see the data up to the > > corrupted > > > > > block > > > > > > > as > > > > > > > > > far as I could tell, but then it started repeatedly loopi= ng > > > back > > > > to > > > > > > the > > > > > > > > > first row and starting over. I believe this is related to > the > > > > > > behavior > > > > > > > > > described in > > https://issues.apache.org/jira/browse/HBASE-12949 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > So, your file is 3G and your blocks are 128M? > > > > > > > > > > > > > > > > The dfsclient should just pass over the bad replica and mov= e > on > > > to > > > > > the > > > > > > > good > > > > > > > > one so it would seem to indicate all replicas are bad for > you. > > > > > > > > > > > > > > > > If you enable DFSClient DEBUG level logging it should repor= t > > > which > > > > > > blocks > > > > > > > > it is reading from. For example, here I am reading the star= t > of > > > the > > > > > > index > > > > > > > > blocks with DFSClient DEBUG enabled but I grep out the > > DFSClient > > > > > > > emissions > > > > > > > > only: > > > > > > > > > > > > > > > > [stack@c2020 ~]$ ./hbase/bin/hbase --config ~/conf_hbase > > > > > > > > org.apache.hadoop.hbase.io.hfile.HFile -h -f > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > /hbase/data/default/tsdb/3f4ea5ea14653cee6006f13c7d06d10b/t/68b00cb158aa4= d839f1744639880f362|grep > > > > > > > > DFSClient > > > > > > > > 2015-03-17 21:42:56,950 DEBUG [main] util.ChecksumType: > > > > > > > > org.apache.hadoop.util.PureJavaCrc32 available > > > > > > > > 2015-03-17 21:42:56,952 DEBUG [main] util.ChecksumType: > > > > > > > > org.apache.hadoop.util.PureJavaCrc32C available > > > > > > > > SLF4J: Class path contains multiple SLF4J bindings. > > > > > > > > SLF4J: Found binding in > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [jar:file:/home/stack/hbase-1.0.1-SNAPSHOT/lib/slf4j-log4j12-1.7.7.jar!/o= rg/slf4j/impl/StaticLoggerBinder.class] > > > > > > > > SLF4J: Found binding in > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [jar:file:/home/stack/hadoop-2.7.0-SNAPSHOT/share/hadoop/common/lib/slf4j= -log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] > > > > > > > > SLF4J: See http://www.slf4j.org/codes.html#multiple_binding= s > > for > > > > an > > > > > > > > explanation. > > > > > > > > SLF4J: Actual binding is of type > > > > [org.slf4j.impl.Log4jLoggerFactory] > > > > > > > > 2015-03-17 21:42:58,082 INFO [main] hfile.CacheConfig: > > > > > > > > CacheConfig:disabled > > > > > > > > 2015-03-17 21:42:58,126 DEBUG [main] hdfs.DFSClient: newInf= o > =3D > > > > > > > > LocatedBlocks{ > > > > > > > > fileLength=3D108633903 > > > > > > > > underConstruction=3Dfalse > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > blocks=3D[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238= 905_1099516142201; > > > > > > > > getBlockSize()=3D108633903; corrupt=3Dfalse; offset=3D0; > > > > > > > > locs=3D[DatanodeInfoWithStorage[10.20.84.27:50011 > > > > > > > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK], > > > > > > > > DatanodeInfoWithStorage[10.20.84.31:50011 > > > > > > > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK], > > > > > > > > DatanodeInfoWithStorage[10.20.84.30:50011 > > > > > > > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > lastLocatedBlock=3DLocatedBlock{BP-410607956-10.20.84.26-1391491814882:bl= k_1078238905_1099516142201; > > > > > > > > getBlockSize()=3D108633903; corrupt=3Dfalse; offset=3D0; > > > > > > > > locs=3D[DatanodeInfoWithStorage[10.20.84.30:50011 > > > > > > > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK], > > > > > > > > DatanodeInfoWithStorage[10.20.84.31:50011 > > > > > > > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK], > > > > > > > > DatanodeInfoWithStorage[10.20.84.27:50011 > > > > > > > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]} > > > > > > > > isLastBlockComplete=3Dtrue} > > > > > > > > 2015-03-17 21:42:58,132 DEBUG [main] hdfs.DFSClient: > Connecting > > > to > > > > > > > datanode > > > > > > > > 10.20.84.27:50011 > > > > > > > > 2015-03-17 21:42:58,281 DEBUG [main] hdfs.DFSClient: > Connecting > > > to > > > > > > > datanode > > > > > > > > 10.20.84.27:50011 > > > > > > > > 2015-03-17 21:42:58,375 DEBUG [main] hdfs.DFSClient: newInf= o > =3D > > > > > > > > LocatedBlocks{ > > > > > > > > fileLength=3D108633903 > > > > > > > > underConstruction=3Dfalse > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > blocks=3D[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238= 905_1099516142201; > > > > > > > > getBlockSize()=3D108633903; corrupt=3Dfalse; offset=3D0; > > > > > > > > locs=3D[DatanodeInfoWithStorage[10.20.84.30:50011 > > > > > > > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK], > > > > > > > > DatanodeInfoWithStorage[10.20.84.31:50011 > > > > > > > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK], > > > > > > > > DatanodeInfoWithStorage[10.20.84.27:50011 > > > > > > > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > lastLocatedBlock=3DLocatedBlock{BP-410607956-10.20.84.26-1391491814882:bl= k_1078238905_1099516142201; > > > > > > > > getBlockSize()=3D108633903; corrupt=3Dfalse; offset=3D0; > > > > > > > > locs=3D[DatanodeInfoWithStorage[10.20.84.27:50011 > > > > > > > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK], > > > > > > > > DatanodeInfoWithStorage[10.20.84.31:50011 > > > > > > > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK], > > > > > > > > DatanodeInfoWithStorage[10.20.84.30:50011 > > > > > > > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]} > > > > > > > > isLastBlockComplete=3Dtrue} > > > > > > > > 2015-03-17 21:42:58,376 DEBUG [main] hdfs.DFSClient: > Connecting > > > to > > > > > > > datanode > > > > > > > > 10.20.84.30:50011 > > > > > > > > 2015-03-17 21:42:58,381 DEBUG [main] hdfs.DFSClient: > Connecting > > > to > > > > > > > datanode > > > > > > > > 10.20.84.27:50011 > > > > > > > > > > > > > > > > Do you see it reading from 'good' or 'bad' blocks? > > > > > > > > > > > > > > > > I added this line to hbase log4j.properties to enable > DFSClient > > > > > DEBUG: > > > > > > > > > > > > > > > > log4j.logger.org.apache.hadoop.hdfs.DFSClient=3DDEBUG > > > > > > > > > > > > > > > > On HBASE-12949, what exception is coming up? Dump it in > here. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > My goal is to determine whether the block in question is > > > actually > > > > > > > corrupt > > > > > > > > > and, if so, in what way. > > > > > > > > > > > > > > > > > > > > > > > > What happens if you just try to copy the file local or > > elsewhere > > > in > > > > > the > > > > > > > > filesystem using dfs shell. Do you get a pure dfs exception > > > > > unhampered > > > > > > by > > > > > > > > hbaseyness? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If it's possible to recover all of the file except > > > > > > > > > a portion of the affected block, that would be OK too. > > > > > > > > > > > > > > > > > > > > > > > > I actually do not see a 'fix' or 'recover' on the hfile too= l. > > We > > > > need > > > > > > to > > > > > > > > add it so you can recover all but the bad block (we should > > figure > > > > how > > > > > > to > > > > > > > > skip the bad section also). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I just don't want to > > > > > > > > > be in the position of having to lose all 3 gigs of data i= n > > this > > > > > > > > particular > > > > > > > > > region, given that most of it appears to be intact. I jus= t > > > can't > > > > > find > > > > > > > the > > > > > > > > > right low-level tools to let me determine the diagnose th= e > > > exact > > > > > > state > > > > > > > > and > > > > > > > > > structure of the block data I have for this file. > > > > > > > > > > > > > > > > > > > > > > > > > > Nod. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Any help or direction that someone could provide would be > > much > > > > > > > > appreciated. > > > > > > > > > For reference, I'll repeat that our client is running > Hadoop > > > > > > > > 2.0.0-cdh4.6.0 > > > > > > > > > and add that the HBase version is 0.94.15-cdh4.6.0. > > > > > > > > > > > > > > > > > > > > > > > > > > See if any of the above helps. I'll try and dig up some mor= e > > > tools > > > > in > > > > > > > > meantime. > > > > > > > > St.Ack > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > > > -md > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Best regards, > > > > > > > > > > > > > > - Andy > > > > > > > > > > > > > > Problems worthy of attack prove their worth by hitting back. = - > > Piet > > > > > Hein > > > > > > > (via Tom White) > > > > > > > > > > > > > > > > > > > > > > > > > > > > --001a113ac6a4f8536c0511aca13d--