hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Recovering from corrupt blocks in HFile
Date Wed, 18 Mar 2015 17:05:40 GMT
On Tue, Mar 17, 2015 at 11:42 PM, Mike Dillon <mike.dillon@synctree.com>

> Thanks. I'll look into those suggestions tomorrow. I'm pretty sure that
> short-circuit reads are not turned on, but I'll double check when I follow
> up on this.
> The main issue that actually led to me being asked to look into this issue
> was that the cluster had a datanode running at 100% disk usage on all its
> mounts. Since it was already in a compromised state and I didn't fully
> understand what restarting it would do, I haven't done that yet.

> It turned out that at least part of the reason that the node got to 100%
> capacity was that major compactions had been silently failing for a couple
> weeks due to the aforementioned corrupt block. When I looked into the logs
> of the node at capacity, I was seeing "compaction failed" error messages
> for a particular region, caused by BlockMissingExceptions for a particular
> block. That's what let me to fsck that block file and start digging into
> the underlying data. The weird thing is that the at-capacity node actually
> had one of the good copies of the failed block and it was a different node
> that had the broken one.
Ok. HDFS gets a little unpredictable when full or, to put it another way,
it has not been well tested at this extreme.

Please paste the exceptions in here when you get a chance. Will help with

> And of course, the logs for when this broken HFile was created have already
> been aged out, so I'm left to chase shadows to some extent.

Of course.

Let us try and help out.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message