hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Dillon <mike.dil...@synctree.com>
Subject Re: Recovering from corrupt blocks in HFile
Date Wed, 18 Mar 2015 06:42:23 GMT
Thanks. I'll look into those suggestions tomorrow. I'm pretty sure that
short-circuit reads are not turned on, but I'll double check when I follow
up on this.

The main issue that actually led to me being asked to look into this issue
was that the cluster had a datanode running at 100% disk usage on all its
mounts. Since it was already in a compromised state and I didn't fully
understand what restarting it would do, I haven't done that yet.

It turned out that at least part of the reason that the node got to 100%
capacity was that major compactions had been silently failing for a couple
weeks due to the aforementioned corrupt block. When I looked into the logs
of the node at capacity, I was seeing "compaction failed" error messages
for a particular region, caused by BlockMissingExceptions for a particular
block. That's what let me to fsck that block file and start digging into
the underlying data. The weird thing is that the at-capacity node actually
had one of the good copies of the failed block and it was a different node
that had the broken one.

And of course, the logs for when this broken HFile was created have already
been aged out, so I'm left to chase shadows to some extent.

-md


On Tue, Mar 17, 2015 at 10:35 PM, Stack <stack@duboce.net> wrote:

> On Tue, Mar 17, 2015 at 9:47 PM, Stack <stack@duboce.net> wrote:
>
> > On Tue, Mar 17, 2015 at 5:04 PM, Mike Dillon <mike.dillon@synctree.com>
> > wrote:
> >
> >> Hi all-
> >>
> >> I've got an HFile that's reporting a corrupt block in "hadoop fsck" and
> >> was
> >> hoping to get some advice on recovering as much data as possible.
> >>
> >> When I examined the blk-* file on the three data nodes that have a
> replica
> >> of the affected block, I saw that the replicas on two of the datanodes
> had
> >> the same SHA-1 checksum and that the replica on the other datanode was a
> >> truncated version of the replica found on the other nodes (as reported
> by
> >> a
> >> difference at EOF by "cmp"). The size of the two identical blocks is
> >> 67108864, the same as most of the other blocks in the file.
> >>
> >> Given that there were two datanodes with the same data and another with
> >> truncated data, I made a backup of the truncated file and dropped the
> >> full-length copy of the block in its place directly on the data mount,
> >> hoping that this would cause HDFS to no longer report the file as
> corrupt.
> >> Unfortunately, this didn't seem to have any effect.
> >>
> >>
> > That seems like a reasonable thing to do.
> >
> > Did you restart the DN that was serving this block before you ran fsck?
> > (Fsck asks namenode what blocks are bad; it likely is still reporting off
> > old info).
> >
> >
> >
> >> Looking through the Hadoop source code, it looks like there is a
> >> CorruptReplicasMap internally that tracks which nodes have "corrupt"
> >> copies
> >> of a block. In HDFS-6663 <
> https://issues.apache.org/jira/browse/HDFS-6663
> >> >,
> >> a "-blockId" parameter was added to "hadoop fsck" to allow dumping the
> >> reason that a block ids is considered corrupt, but that wasn't added
> until
> >> Hadoop 2.7.0 and our client is running 2.0.0-cdh4.6.0.
> >>
> >>
> > Good digging.
> >
> >
> >
> >> I also had a look at running the "HFile" tool on the affected file (cf.
> >> section 9.7.5.2.2 at
> http://hbase.apache.org/0.94/book/regions.arch.html
> >> ).
> >> When I did that, I was able to see the data up to the corrupted block as
> >> far as I could tell, but then it started repeatedly looping back to the
> >> first row and starting over. I believe this is related to the behavior
> >> described in https://issues.apache.org/jira/browse/HBASE-12949
> >
> >
> >
> > So, your file is 3G and your blocks are 128M?
> >
> > The dfsclient should just pass over the bad replica and move on to the
> > good one so it would seem to indicate all replicas are bad for you.
> >
> > If you enable DFSClient DEBUG level logging it should report which blocks
> > it is reading from. For example, here I am reading the start of the index
> > blocks with DFSClient DEBUG enabled but I grep out the DFSClient
> emissions
> > only:
> >
> > [stack@c2020 ~]$ ./hbase/bin/hbase --config ~/conf_hbase
> > org.apache.hadoop.hbase.io.hfile.HFile -h -f
> >
> /hbase/data/default/tsdb/3f4ea5ea14653cee6006f13c7d06d10b/t/68b00cb158aa4d839f1744639880f362|grep
> > DFSClient
> > 2015-03-17 21:42:56,950 DEBUG [main] util.ChecksumType:
> > org.apache.hadoop.util.PureJavaCrc32 available
> > 2015-03-17 21:42:56,952 DEBUG [main] util.ChecksumType:
> > org.apache.hadoop.util.PureJavaCrc32C available
> > SLF4J: Class path contains multiple SLF4J bindings.
> > SLF4J: Found binding in
> >
> [jar:file:/home/stack/hbase-1.0.1-SNAPSHOT/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > SLF4J: Found binding in
> >
> [jar:file:/home/stack/hadoop-2.7.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> > explanation.
> > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> > 2015-03-17 21:42:58,082 INFO  [main] hfile.CacheConfig:
> > CacheConfig:disabled
> > 2015-03-17 21:42:58,126 DEBUG [main] hdfs.DFSClient: newInfo =
> > LocatedBlocks{
> >   fileLength=108633903
> >   underConstruction=false
> >
> >
> blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > getBlockSize()=108633903; corrupt=false; offset=0;
> > locs=[DatanodeInfoWithStorage[10.20.84.27:50011
> ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > DatanodeInfoWithStorage[10.20.84.31:50011
> ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > DatanodeInfoWithStorage[10.20.84.30:50011
> > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
> >
> >
> lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > getBlockSize()=108633903; corrupt=false; offset=0;
> > locs=[DatanodeInfoWithStorage[10.20.84.30:50011
> ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > DatanodeInfoWithStorage[10.20.84.31:50011
> ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > DatanodeInfoWithStorage[10.20.84.27:50011
> > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
> >   isLastBlockComplete=true}
> > 2015-03-17 21:42:58,132 DEBUG [main] hdfs.DFSClient: Connecting to
> > datanode 10.20.84.27:50011
> > 2015-03-17 21:42:58,281 DEBUG [main] hdfs.DFSClient: Connecting to
> > datanode 10.20.84.27:50011
> > 2015-03-17 21:42:58,375 DEBUG [main] hdfs.DFSClient: newInfo =
> > LocatedBlocks{
> >   fileLength=108633903
> >   underConstruction=false
> >
> >
> blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > getBlockSize()=108633903; corrupt=false; offset=0;
> > locs=[DatanodeInfoWithStorage[10.20.84.30:50011
> ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > DatanodeInfoWithStorage[10.20.84.31:50011
> ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > DatanodeInfoWithStorage[10.20.84.27:50011
> > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
> >
> >
> lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > getBlockSize()=108633903; corrupt=false; offset=0;
> > locs=[DatanodeInfoWithStorage[10.20.84.27:50011
> ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > DatanodeInfoWithStorage[10.20.84.31:50011
> ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > DatanodeInfoWithStorage[10.20.84.30:50011
> > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
> >   isLastBlockComplete=true}
> > 2015-03-17 21:42:58,376 DEBUG [main] hdfs.DFSClient: Connecting to
> > datanode 10.20.84.30:50011
> > 2015-03-17 21:42:58,381 DEBUG [main] hdfs.DFSClient: Connecting to
> > datanode 10.20.84.27:50011
> >
> > Do you see it reading from 'good' or 'bad' blocks?
> >
> > I added this line to hbase log4j.properties to enable DFSClient DEBUG:
> >
> > log4j.logger.org.apache.hadoop.hdfs.DFSClient=DEBUG
> >
> > On HBASE-12949, what exception is coming up?  Dump it in here.
> >
> >
> >
> >> My goal is to determine whether the block in question is actually
> corrupt
> >> and, if so, in what way.
> >
> >
> > What happens if you just try to copy the file local or elsewhere in the
> > filesystem using dfs shell. Do you get a pure dfs exception unhampered by
> > hbaseyness?
> >
> >
> >
> >> If it's possible to recover all of the file except
> >> a portion of the affected block, that would be OK too.
> >
> >
> > I actually do not see a 'fix' or 'recover' on the hfile tool. We need to
> > add it so you can recover all but the bad block (we should figure how to
> > skip the bad section also).
> >
> >
> >
> >> I just don't want to
> >> be in the position of having to lose all 3 gigs of data in this
> particular
> >> region, given that most of it appears to be intact. I just can't find
> the
> >> right low-level tools to let me determine the diagnose the exact state
> and
> >> structure of the block data I have for this file.
> >>
> >>
> > Nod.
> >
> >
> >
> >> Any help or direction that someone could provide would be much
> >> appreciated.
> >> For reference, I'll repeat that our client is running Hadoop
> >> 2.0.0-cdh4.6.0
> >> and add that the HBase version is 0.94.15-cdh4.6.0.
> >>
> >>
> > See if any of the above helps. I'll try and dig up some more tools in
> > meantime.
> >
>
> I asked some folks who know better and they suggested and asked various:
>
> + Are you doing short-circuit reads?  If so, this may be frustrating
> DFSClient moving to good block.
> + In later versions of hadoop (cdh5.2.1 for example), you could do hdfs
> dfsadmin -triggerBlockReport  DN:PORT.. this is probably of no use to you
> so you might have to restart the DN to have NN notice change in blocks.
> + This might be better than what I suggested above:
> HADOOP_ROOT_LOGGER="TRACE,console"  hdfs dfs -cat /interesting_file
>
> St.Ack
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message