hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anu Engineer <aengin...@hortonworks.com>
Subject Re: datanode directory structure mess-up
Date Sat, 05 Mar 2016 22:13:23 GMT
I am so sorry to hear this, but I don’t think we have any tool at this point of time that
can fix that layout issue and I don’t know enough about the volume-balancer tool to comment
on other options.

If you are okay with losing some of your blocks ( since other nodes are in bad state too),
 you can decommission the node and just re-add it  and wait for cluster to heal itself.
We have been working on a tool to address disk balancing issue, if you are interested  you
can follow the progress of that tool in HDFS-1312.


Ps. Just out of curiosity, can I ask you what prompted you to run this tool ? Did you replace
a disk or where you running out of space on one disk on that node ?

From: David Watzke <david@watzke.cz<mailto:david@watzke.cz>>
Date: Saturday, March 5, 2016 at 6:47 AM
To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Subject: datanode directory structure mess-up

Hi list,

I ran into trouble because I accidentally used this tool https://github.com/killerwhile/volume-balancer
with Hadoop 2.6.0 (just like that page warns you not to -- I used it successfully before and
didn't think to check that page before using it again) and it messed up my datadirs because
as I understand it that software now makes invalid assumptions about what directory moves
can it do. Now the datanode logs are filled with these:

WARN org.apache.hadoop.hdfs.server.datanode.VolumeScanner: I/O error while finding block BP-680964103-A.B.C.D-1375882473930:blk_5822441067008155275_0
on volume /xyz/dfs/dn

What can I do to fix this? I don't know what files/dirs were moved and from where but is there
a reasonable way out of this? Such as editing VERSION file to a previous version when DN is
down so that it fixes the layout by itself - would that work?

Please note that I've lost the other replica due to a filesystem error so I can't just ignore
it. This is literally my only option to recover some missing blocks.


David Watzke
View raw message