hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Balog <doug.hdphdfs...@dugos.com>
Subject Rebalancing data across partitions on a datanode.
Date Wed, 25 Aug 2010 15:13:58 GMT
We've just added a couple of new drives to our datanodes. 
Each new drive has a single filesystem which we added to  dfs.data.dir, and mapred.{local,tmp}.dir.
Now I want to rebalance the data across the new filesystems so that they are equally utilized.
My plan is to write a script that does the following.

- Calculate how much data each filesystem should have.
- while filesystems are not balanced, 
	- Randomly pick a file and its .meta file from a filesystem that is over utilized.
	- Copy them to a tmp name on an under utilized filesystem.
	- Rename files from tmp to proper location on under utilized filesystem.
	- Remove files from the over utilized filesystem.

I think this will work because I believe that the datanode tries to open the file
on each of the filesystems until it succeeds. So it doesn't store the filesystem that 
the block lives on in memory.

Will this work ?
What are the gotcha's that I have to watch out for ?



View raw message