hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Watzke <da...@watzke.cz>
Subject Re: datanode directory structure mess-up
Date Sat, 05 Mar 2016 22:36:02 GMT
It's not that big of a deal, the source data were available on our other 
cluster in another DC and the rest could be recomputed from that but I 
just wanted to know.

Thanks for the reply, good to know.

The reason I ran the tool was that we recently added more datadirs 
(disks) per each datanode and the new datadirs were empty while others 
were almost full. It's a shame that native HDFS tools (such as balancer) 
aren't able to do inter-node volume rebalance.

We tried to add the new disks as the ARCHIVE storage so we could mark 
some old data as COLD but when I did that and ran the mover we hit bugs
https://issues.apache.org/jira/browse/HDFS-8770
https://issues.apache.org/jira/browse/HDFS-9661
which made me decide to add new datadirs as regular DISK storage instead 
for the time being...

In the meantime cloudera's got the fix for the NN crash and I already 
know I'm able to patch the DN deadlock so we might try that again soon.

Thanks,

David Watzke

Dne 5.3.2016 v 23:13 Anu Engineer napsal(a):
> I am so sorry to hear this, but I don’t think we have any tool at this 
> point of time that can fix that layout issue and I don’t know enough 
> about the volume-balancer tool to comment on other options.
>
> If you are okay with losing some of your blocks ( since other nodes 
> are in bad state too),  you can decommission the node and just re-add 
> it  and wait for cluster to heal itself.
> We have been working on a tool to address disk balancing issue, if you 
> are interested  you can follow the progress of that tool in HDFS-1312.
>
> —Anu
>
> Ps. Just out of curiosity, can I ask you what prompted you to run this 
> tool ? Did you replace a disk or where you running out of space on one 
> disk on that node ?
>
> From: David Watzke <david@watzke.cz <mailto:david@watzke.cz>>
> Date: Saturday, March 5, 2016 at 6:47 AM
> To: "user@hadoop.apache.org <mailto:user@hadoop.apache.org>" 
> <user@hadoop.apache.org <mailto:user@hadoop.apache.org>>
> Subject: datanode directory structure mess-up
>
> Hi list,
>
> I ran into trouble because I accidentally usedthis tool 
> https://github.com/killerwhile/volume-balancerwith Hadoop 2.6.0 (just 
> like that page warns you not to -- I used it successfully before and 
> didn't think to check that page before using it again) and it messed 
> up my datadirs because as I understand it that software now makes 
> invalid assumptions about what directory moves can it do. Now the 
> datanode logs are filled with these:
>
> WARN org.apache.hadoop.hdfs.server.datanode.VolumeScanner: I/O error 
> while finding block 
> BP-680964103-A.B.C.D-1375882473930:blk_5822441067008155275_0 on volume 
> /xyz/dfs/dn
>
> What can I do to fix this? I don't know what files/dirs were moved and 
> from where but is there a reasonable way out of this? Such as editing 
> VERSION file to a previous version when DN is down so that it fixes 
> the layout by itself - would that work?
>
> Please note that I've lost the other replica due to a filesystem error 
> so I can't just ignore it. This is literally my only option to recover 
> some missing blocks.
>
> Thanks,
>
> -- 
> David Watzke


Mime
View raw message