Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: error (nike.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: 
 <CAHkeaHXkhqFppraEaQLVpp7wkhxm=-MDb=0YXr8v73OC6Lj-mw@mail.gmail.com>
References: 
 <CAB+7sRh2ynUUixD6OxgC8bGbRHA8PXbjSw+AoXxJb8QrK6YtUw@mail.gmail.com>
	<CADcMMgEG9yx1VCMeFGAPM4qWvpu63XBkys6pL-kyKoZvPfu_hA@mail.gmail.com>
	<CA+RK=_Anb92pJ6U7dNZ1P+ntOPXZX1rQJi-NU=Osq93XQ8HX8g@mail.gmail.com>
	<CAB+7sRhpKBjFuGsrTKZry+yZOccPaWOGFqu8WK3S_BC4N44umg@mail.gmail.com>
	<CAHkeaHUV9AwaHWYJvnAS4BQz4zThxnOcJbT-nbY-J9zc=88zyA@mail.gmail.com>
	<CAB+7sRjzxPfu79ACP791snuTdbeyNSojkh7wYWr2fAaLfvyDNg@mail.gmail.com>
	<CAHkeaHX-E_Z2Lxuc4xCNaR254K2OgqmSt4cdChyZxpc4skFiVw@mail.gmail.com>
	<CAB+7sRjCpknPhFnsUrzjzqM0+_0_Gta7skpjK0pV1iP52a9Jkg@mail.gmail.com>
	<CAHkeaHXkhqFppraEaQLVpp7wkhxm=-MDb=0YXr8v73OC6Lj-mw@mail.gmail.com>
Date: Thu, 19 Mar 2015 16:33:52 -0700
Message-ID: 
 <CAB+7sRjw5zW5jT68+afQj3WcDVoPuKF5-pnay_3t8Odwt5TWQQ@mail.gmail.com>
Subject: Re: Recovering from corrupt blocks in HFile
From: Mike Dillon <mike.dillon@synctree.com>
To: user@hbase.apache.org
Content-Type: multipart/alternative; boundary=001a113ac6a4f8536c0511aca13d

--001a113ac6a4f8536c0511aca13d
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Thank you!

On Thu, Mar 19, 2015 at 1:48 PM, Jerry He <jerryjch@gmail.com> wrote:

> It is ok to delete the hfile in question with hadoop file system command.
> No restart of hbase is needed.  You may see some error exceptions if ther=
e
> are things (user scan, compaction) on the fly.  But it will be ok.
>
> Jerry
>
> On Thu, Mar 19, 2015 at 12:27 PM, Mike Dillon <mike.dillon@synctree.com>
> wrote:
>
> > So, it turns out that the client has an archived data source that can
> > recreate the HBase data in question if needed, so the need for me to
> > actually recover this HFile has diminished to the point where it's
> probably
> > not worth investing my time in creating a custom tool to extract the
> data.
> >
> > Given that they're willing to lose the data in this region and recreate
> it
> > if necessary, do I simply need to delete the HFile to make HDFS happy o=
r
> is
> > there something I need to do at the HBase level to tell it that data wi=
ll
> > be going away?
> >
> > Thanks so much everyone for your help on this issue!
> >
> > -md
> >
> > On Wed, Mar 18, 2015 at 10:46 PM, Jerry He <jerryjch@gmail.com> wrote:
> >
> > > From HBase perspective, since we don't have a ready tool, the general
> > idea
> > > will need you to have access to HBase source code and write your own
> > tool.
> > > On the high level, the tool will read/scan the KVs from the hfile
> similar
> > > to what the HFile tool does, while opening a HFileWriter to dump the
> good
> > > data until you are not able to do so.
> > > Then you will close the HFileWriter with the necessary meta file info=
.
> > > There are APIs in HBase to do so, but they may not be external public
> > API.
> > >
> > > Jerry
> > >
> > > On Wed, Mar 18, 2015 at 4:27 PM, Mike Dillon <mike.dillon@synctree.co=
m
> >
> > > wrote:
> > >
> > > > I've had a chance to try out Stack's passed along suggestion of
> > > > HADOOP_ROOT_LOGGER=3D"TRACE,console"  hdfs dfs -cat and managed to =
get
> > > this:
> > > > https://gist.github.com/md5/d42e97ab7a0bd656f09a
> > > >
> > > > After knowing what to look for, I was able to find the same checksu=
m
> > > > failures in the logs during the major compaction failures.
> > > >
> > > > I'm willing to accept that all the data after that point in the
> corrupt
> > > > block is lost, so any specific advice for how to replace that block
> > with
> > > a
> > > > partial one containing only the good data would be appreciated. I'm
> > aware
> > > > that there may be other checksum failures in the subsequent blocks =
as
> > > well,
> > > > since nothing is currently able to read past the first corruption
> > point,
> > > > but I'll just have to wash, rinse, and repeat to see how much good
> data
> > > is
> > > > left is the file as a whole.
> > > >
> > > > -md
> > > >
> > > > On Wed, Mar 18, 2015 at 2:41 PM, Jerry He <jerryjch@gmail.com>
> wrote:
> > > >
> > > > > For a 'fix' and 'recover' hfile tool at HBase level,  the
> relatively
> > > easy
> > > > > thing we can recover is probably the data (KVs) up to the point
> when
> > we
> > > > hit
> > > > > the first corruption caused exception.
> > > > > After that, it will not be as easy.  For example, if the current
> key
> > > > length
> > > > > or value length is bad, there is no way to skip to the next KV.  =
We
> > > will
> > > > > probably need to skip the whole current hblock, and go to the nex=
t
> > > block
> > > > > for KVs assuming the hblock index is still good.
> > > > >
> > > > > HBASE-12949 <https://issues.apache.org/jira/browse/HBASE-12949>
> does
> > > an
> > > > > incremental improvement to make sure we do get a corruption cause=
d
> > > > > exception so that the scan/read will not go into an infinite loop=
.
> > > > >
> > > > > Jerry
> > > > >
> > > > > On Wed, Mar 18, 2015 at 12:03 PM, Mike Dillon <
> > > mike.dillon@synctree.com>
> > > > > wrote:
> > > > >
> > > > > > I haven't filed one myself, but I can do so if my investigation
> > ends
> > > up
> > > > > > finding something bug-worthy as opposed to just random failures
> due
> > > to
> > > > > > out-of-disk scenarios.
> > > > > >
> > > > > > Unfortunately, I had to prioritize some other work this morning=
,
> > so I
> > > > > > haven't made it back to the bad node yet.
> > > > > >
> > > > > > I did attempt restarting the datanode to see if I could make
> hadoop
> > > > fsck
> > > > > > happy, but that didn't have any noticeable effect. I'm hoping t=
o
> > have
> > > > > more
> > > > > > time this afternoon to investigate the other suggestions from
> this
> > > > > thread.
> > > > > >
> > > > > > -md
> > > > > >
> > > > > > On Wed, Mar 18, 2015 at 11:41 AM, Andrew Purtell <
> > > apurtell@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > =E2=80=8B
> > > > > > > On Tue, Mar 17, 2015 at 9:47 PM, Stack <stack@duboce.net>
> wrote:
> > > > > > > >
> > > > > > > > > If it's possible to recover all of the file except
> > > > > > > > > a portion of the affected block, that would be OK too.
> > > > > > > >
> > > > > > > > I actually do not see a 'fix' or 'recover' on the hfile too=
l.
> > We
> > > > need
> > > > > > to
> > > > > > > > add it so you can recover all but the bad block (we should
> > figure
> > > > how
> > > > > > to
> > > > > > > > skip the bad section also).
> > > > > > >
> > > > > > >
> > > > > > > =E2=80=8BI was just getting caught up on this thread and had =
the same
> > > > thought.
> > > > > Is
> > > > > > > there an issue filed for this?
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Mar 17, 2015 at 9:47 PM, Stack <stack@duboce.net>
> wrote:
> > > > > > >
> > > > > > > > On Tue, Mar 17, 2015 at 5:04 PM, Mike Dillon <
> > > > > mike.dillon@synctree.com
> > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi all-
> > > > > > > > >
> > > > > > > > > I've got an HFile that's reporting a corrupt block in
> "hadoop
> > > > fsck"
> > > > > > and
> > > > > > > > was
> > > > > > > > > hoping to get some advice on recovering as much data as
> > > possible.
> > > > > > > > >
> > > > > > > > > When I examined the blk-* file on the three data nodes th=
at
> > > have
> > > > a
> > > > > > > > replica
> > > > > > > > > of the affected block, I saw that the replicas on two of
> the
> > > > > > datanodes
> > > > > > > > had
> > > > > > > > > the same SHA-1 checksum and that the replica on the other
> > > > datanode
> > > > > > was
> > > > > > > a
> > > > > > > > > truncated version of the replica found on the other nodes
> (as
> > > > > > reported
> > > > > > > > by a
> > > > > > > > > difference at EOF by "cmp"). The size of the two identica=
l
> > > blocks
> > > > > is
> > > > > > > > > 67108864, the same as most of the other blocks in the fil=
e.
> > > > > > > > >
> > > > > > > > > Given that there were two datanodes with the same data an=
d
> > > > another
> > > > > > with
> > > > > > > > > truncated data, I made a backup of the truncated file and
> > > dropped
> > > > > the
> > > > > > > > > full-length copy of the block in its place directly on th=
e
> > data
> > > > > > mount,
> > > > > > > > > hoping that this would cause HDFS to no longer report the
> > file
> > > as
> > > > > > > > corrupt.
> > > > > > > > > Unfortunately, this didn't seem to have any effect.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > That seems like a reasonable thing to do.
> > > > > > > >
> > > > > > > > Did you restart the DN that was serving this block before y=
ou
> > ran
> > > > > fsck?
> > > > > > > > (Fsck asks namenode what blocks are bad; it likely is still
> > > > reporting
> > > > > > off
> > > > > > > > old info).
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > Looking through the Hadoop source code, it looks like the=
re
> > is
> > > a
> > > > > > > > > CorruptReplicasMap internally that tracks which nodes hav=
e
> > > > > "corrupt"
> > > > > > > > copies
> > > > > > > > > of a block. In HDFS-6663 <
> > > > > > > > https://issues.apache.org/jira/browse/HDFS-6663
> > > > > > > > > >,
> > > > > > > > > a "-blockId" parameter was added to "hadoop fsck" to allo=
w
> > > > dumping
> > > > > > the
> > > > > > > > > reason that a block ids is considered corrupt, but that
> > wasn't
> > > > > added
> > > > > > > > until
> > > > > > > > > Hadoop 2.7.0 and our client is running 2.0.0-cdh4.6.0.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > Good digging.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > I also had a look at running the "HFile" tool on the
> affected
> > > > file
> > > > > > (cf.
> > > > > > > > > section 9.7.5.2.2 at
> > > > > > > http://hbase.apache.org/0.94/book/regions.arch.html
> > > > > > > > ).
> > > > > > > > > When I did that, I was able to see the data up to the
> > corrupted
> > > > > block
> > > > > > > as
> > > > > > > > > far as I could tell, but then it started repeatedly loopi=
ng
> > > back
> > > > to
> > > > > > the
> > > > > > > > > first row and starting over. I believe this is related to
> the
> > > > > > behavior
> > > > > > > > > described in
> > https://issues.apache.org/jira/browse/HBASE-12949
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > So, your file is 3G and your blocks are 128M?
> > > > > > > >
> > > > > > > > The dfsclient should just pass over the bad replica and mov=
e
> on
> > > to
> > > > > the
> > > > > > > good
> > > > > > > > one so it would seem to indicate all replicas are bad for
> you.
> > > > > > > >
> > > > > > > > If you enable DFSClient DEBUG level logging it should repor=
t
> > > which
> > > > > > blocks
> > > > > > > > it is reading from. For example, here I am reading the star=
t
> of
> > > the
> > > > > > index
> > > > > > > > blocks with DFSClient DEBUG enabled but I grep out the
> > DFSClient
> > > > > > > emissions
> > > > > > > > only:
> > > > > > > >
> > > > > > > > [stack@c2020 ~]$ ./hbase/bin/hbase --config ~/conf_hbase
> > > > > > > > org.apache.hadoop.hbase.io.hfile.HFile -h -f
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> /hbase/data/default/tsdb/3f4ea5ea14653cee6006f13c7d06d10b/t/68b00cb158aa4=
d839f1744639880f362|grep
> > > > > > > > DFSClient
> > > > > > > > 2015-03-17 21:42:56,950 DEBUG [main] util.ChecksumType:
> > > > > > > > org.apache.hadoop.util.PureJavaCrc32 available
> > > > > > > > 2015-03-17 21:42:56,952 DEBUG [main] util.ChecksumType:
> > > > > > > > org.apache.hadoop.util.PureJavaCrc32C available
> > > > > > > > SLF4J: Class path contains multiple SLF4J bindings.
> > > > > > > > SLF4J: Found binding in
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> [jar:file:/home/stack/hbase-1.0.1-SNAPSHOT/lib/slf4j-log4j12-1.7.7.jar!/o=
rg/slf4j/impl/StaticLoggerBinder.class]
> > > > > > > > SLF4J: Found binding in
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> [jar:file:/home/stack/hadoop-2.7.0-SNAPSHOT/share/hadoop/common/lib/slf4j=
-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > > > > > > > SLF4J: See http://www.slf4j.org/codes.html#multiple_binding=
s
> > for
> > > > an
> > > > > > > > explanation.
> > > > > > > > SLF4J: Actual binding is of type
> > > > [org.slf4j.impl.Log4jLoggerFactory]
> > > > > > > > 2015-03-17 21:42:58,082 INFO  [main] hfile.CacheConfig:
> > > > > > > > CacheConfig:disabled
> > > > > > > > 2015-03-17 21:42:58,126 DEBUG [main] hdfs.DFSClient: newInf=
o
> =3D
> > > > > > > > LocatedBlocks{
> > > > > > > >   fileLength=3D108633903
> > > > > > > >   underConstruction=3Dfalse
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> blocks=3D[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238=
905_1099516142201;
> > > > > > > > getBlockSize()=3D108633903; corrupt=3Dfalse; offset=3D0;
> > > > > > > > locs=3D[DatanodeInfoWithStorage[10.20.84.27:50011
> > > > > > > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > > > > > > DatanodeInfoWithStorage[10.20.84.31:50011
> > > > > > > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > > > > > > DatanodeInfoWithStorage[10.20.84.30:50011
> > > > > > > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> lastLocatedBlock=3DLocatedBlock{BP-410607956-10.20.84.26-1391491814882:bl=
k_1078238905_1099516142201;
> > > > > > > > getBlockSize()=3D108633903; corrupt=3Dfalse; offset=3D0;
> > > > > > > > locs=3D[DatanodeInfoWithStorage[10.20.84.30:50011
> > > > > > > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > > > > > > DatanodeInfoWithStorage[10.20.84.31:50011
> > > > > > > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > > > > > > DatanodeInfoWithStorage[10.20.84.27:50011
> > > > > > > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
> > > > > > > >   isLastBlockComplete=3Dtrue}
> > > > > > > > 2015-03-17 21:42:58,132 DEBUG [main] hdfs.DFSClient:
> Connecting
> > > to
> > > > > > > datanode
> > > > > > > > 10.20.84.27:50011
> > > > > > > > 2015-03-17 21:42:58,281 DEBUG [main] hdfs.DFSClient:
> Connecting
> > > to
> > > > > > > datanode
> > > > > > > > 10.20.84.27:50011
> > > > > > > > 2015-03-17 21:42:58,375 DEBUG [main] hdfs.DFSClient: newInf=
o
> =3D
> > > > > > > > LocatedBlocks{
> > > > > > > >   fileLength=3D108633903
> > > > > > > >   underConstruction=3Dfalse
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> blocks=3D[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238=
905_1099516142201;
> > > > > > > > getBlockSize()=3D108633903; corrupt=3Dfalse; offset=3D0;
> > > > > > > > locs=3D[DatanodeInfoWithStorage[10.20.84.30:50011
> > > > > > > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > > > > > > DatanodeInfoWithStorage[10.20.84.31:50011
> > > > > > > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > > > > > > DatanodeInfoWithStorage[10.20.84.27:50011
> > > > > > > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> lastLocatedBlock=3DLocatedBlock{BP-410607956-10.20.84.26-1391491814882:bl=
k_1078238905_1099516142201;
> > > > > > > > getBlockSize()=3D108633903; corrupt=3Dfalse; offset=3D0;
> > > > > > > > locs=3D[DatanodeInfoWithStorage[10.20.84.27:50011
> > > > > > > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > > > > > > DatanodeInfoWithStorage[10.20.84.31:50011
> > > > > > > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > > > > > > DatanodeInfoWithStorage[10.20.84.30:50011
> > > > > > > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
> > > > > > > >   isLastBlockComplete=3Dtrue}
> > > > > > > > 2015-03-17 21:42:58,376 DEBUG [main] hdfs.DFSClient:
> Connecting
> > > to
> > > > > > > datanode
> > > > > > > > 10.20.84.30:50011
> > > > > > > > 2015-03-17 21:42:58,381 DEBUG [main] hdfs.DFSClient:
> Connecting
> > > to
> > > > > > > datanode
> > > > > > > > 10.20.84.27:50011
> > > > > > > >
> > > > > > > > Do you see it reading from 'good' or 'bad' blocks?
> > > > > > > >
> > > > > > > > I added this line to hbase log4j.properties to enable
> DFSClient
> > > > > DEBUG:
> > > > > > > >
> > > > > > > > log4j.logger.org.apache.hadoop.hdfs.DFSClient=3DDEBUG
> > > > > > > >
> > > > > > > > On HBASE-12949, what exception is coming up?  Dump it in
> here.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > My goal is to determine whether the block in question is
> > > actually
> > > > > > > corrupt
> > > > > > > > > and, if so, in what way.
> > > > > > > >
> > > > > > > >
> > > > > > > > What happens if you just try to copy the file local or
> > elsewhere
> > > in
> > > > > the
> > > > > > > > filesystem using dfs shell. Do you get a pure dfs exception
> > > > > unhampered
> > > > > > by
> > > > > > > > hbaseyness?
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > If it's possible to recover all of the file except
> > > > > > > > > a portion of the affected block, that would be OK too.
> > > > > > > >
> > > > > > > >
> > > > > > > > I actually do not see a 'fix' or 'recover' on the hfile too=
l.
> > We
> > > > need
> > > > > > to
> > > > > > > > add it so you can recover all but the bad block (we should
> > figure
> > > > how
> > > > > > to
> > > > > > > > skip the bad section also).
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > I just don't want to
> > > > > > > > > be in the position of having to lose all 3 gigs of data i=
n
> > this
> > > > > > > > particular
> > > > > > > > > region, given that most of it appears to be intact. I jus=
t
> > > can't
> > > > > find
> > > > > > > the
> > > > > > > > > right low-level tools to let me determine the diagnose th=
e
> > > exact
> > > > > > state
> > > > > > > > and
> > > > > > > > > structure of the block data I have for this file.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > Nod.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > Any help or direction that someone could provide would be
> > much
> > > > > > > > appreciated.
> > > > > > > > > For reference, I'll repeat that our client is running
> Hadoop
> > > > > > > > 2.0.0-cdh4.6.0
> > > > > > > > > and add that the HBase version is 0.94.15-cdh4.6.0.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > See if any of the above helps. I'll try and dig up some mor=
e
> > > tools
> > > > in
> > > > > > > > meantime.
> > > > > > > > St.Ack
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > Thanks!
> > > > > > > > >
> > > > > > > > > -md
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best regards,
> > > > > > >
> > > > > > >    - Andy
> > > > > > >
> > > > > > > Problems worthy of attack prove their worth by hitting back. =
-
> > Piet
> > > > > Hein
> > > > > > > (via Tom White)
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

--001a113ac6a4f8536c0511aca13d--