hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Wilberding <...@wilberding.com>
Subject Disk on data node full
Date Sat, 17 Mar 2012 12:57:19 GMT
Hi there,

Our data nodes all have 2 disks, one which is nearly full and one which is nearly empty:

$ df -h
Filesystem            Size  Used Avail Use% Mounted on
                      120G   11G  104G   9% /
/dev/cciss/c0d0p1      99M   35M   60M  37% /boot
tmpfs                 7.9G     0  7.9G   0% /dev/shm
/dev/cciss/c0d1       1.8T  1.7T  103G  95% /data
/dev/cciss/c0d2       1.8T   76G  1.8T   5% /data2

Reading through the docs and mailing list archives, my understanding is that HDFS will continue
to round robin to both disks until /data is completely full and then only write to /data2.
Is this correct? Does it really write until the disk is 100% full (or as close to full as

Ignoring performance of this situation and the monitoring hassles of having full disks, I
just want to be sure that nothing bad is going to happen over the next couple of days as we
fill up that /data partition.

I understand that my best two options to rebalance each data node would be to either:
1) bring down HDFS and just manually move ~50% of the /data/dfs/dn/current/subdir* directories
over to /data2 and then bring HDFS back up
2) bring a data node down one at a time, clean our /data and /data2, put the node back into
rotation and let the balancer distribute replication data back onto the node and since it
will round robin to both (now empty) disks, I will wind up with a nicely balanced data node.
Repeat this process for the remaining nodes.

I'm relatively new to HDFS, so can someone please confirm whether what I'm saying is correct?
Any tips, tricks or things to watch out for would also be greatly appreciated.

View raw message