hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Watzke <da...@watzke.cz>
Subject Re: datanode directory structure mess-up
Date Sat, 05 Mar 2016 22:36:02 GMT
It's not that big of a deal, the source data were available on our other 
cluster in another DC and the rest could be recomputed from that but I 
just wanted to know.

Thanks for the reply, good to know.

The reason I ran the tool was that we recently added more datadirs 
(disks) per each datanode and the new datadirs were empty while others 
were almost full. It's a shame that native HDFS tools (such as balancer) 
aren't able to do inter-node volume rebalance.

We tried to add the new disks as the ARCHIVE storage so we could mark 
some old data as COLD but when I did that and ran the mover we hit bugs
which made me decide to add new datadirs as regular DISK storage instead 
for the time being...

In the meantime cloudera's got the fix for the NN crash and I already 
know I'm able to patch the DN deadlock so we might try that again soon.


David Watzke

Dne 5.3.2016 v 23:13 Anu Engineer napsal(a):
> I am so sorry to hear this, but I don’t think we have any tool at this 
> point of time that can fix that layout issue and I don’t know enough 
> about the volume-balancer tool to comment on other options.
> If you are okay with losing some of your blocks ( since other nodes 
> are in bad state too),  you can decommission the node and just re-add 
> it  and wait for cluster to heal itself.
> We have been working on a tool to address disk balancing issue, if you 
> are interested  you can follow the progress of that tool in HDFS-1312.
> —Anu
> Ps. Just out of curiosity, can I ask you what prompted you to run this 
> tool ? Did you replace a disk or where you running out of space on one 
> disk on that node ?
> From: David Watzke <david@watzke.cz <mailto:david@watzke.cz>>
> Date: Saturday, March 5, 2016 at 6:47 AM
> To: "user@hadoop.apache.org <mailto:user@hadoop.apache.org>" 
> <user@hadoop.apache.org <mailto:user@hadoop.apache.org>>
> Subject: datanode directory structure mess-up
> Hi list,
> I ran into trouble because I accidentally usedthis tool 
> https://github.com/killerwhile/volume-balancerwith Hadoop 2.6.0 (just 
> like that page warns you not to -- I used it successfully before and 
> didn't think to check that page before using it again) and it messed 
> up my datadirs because as I understand it that software now makes 
> invalid assumptions about what directory moves can it do. Now the 
> datanode logs are filled with these:
> WARN org.apache.hadoop.hdfs.server.datanode.VolumeScanner: I/O error 
> while finding block 
> BP-680964103-A.B.C.D-1375882473930:blk_5822441067008155275_0 on volume 
> /xyz/dfs/dn
> What can I do to fix this? I don't know what files/dirs were moved and 
> from where but is there a reasonable way out of this? Such as editing 
> VERSION file to a previous version when DN is down so that it fixes 
> the layout by itself - would that work?
> Please note that I've lost the other replica due to a filesystem error 
> so I can't just ignore it. This is literally my only option to recover 
> some missing blocks.
> Thanks,
> -- 
> David Watzke

View raw message