hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: How to redistribute files on HDFS after adding new machines to cluster?
Date Fri, 07 Aug 2009 17:58:21 GMT
Make sure you rebalance soon after adding the new node.  Otherwise, you will
have an age bias in file distribution.  This can, in some applications, lead
to some strange effects.  For example, if you have log files that you delete
when they get too old, disk space will be freed non-uniformly.  This
shouldn't much affect performance, but it can lead to a need to rebalance
again (and again) later.  Normal file churn combined with occasional
rebalancing should eventually fix this, but it is nicer not to.

On Fri, Aug 7, 2009 at 10:48 AM, Ravi Phulari <rphulari@gmail.com> wrote:

> Use Rebalancer
>
>
> http://hadoop.apache.org/common/docs/r0.20.0/hdfs_user_guide.html#Rebalancer
> -
> Ravi
>
> On 8/7/09 10:38 AM, "prashant ullegaddi" <prashullegaddi@gmail.com> wrote:
>
> > Hi,
> >
> > We had a cluster of 9 machines with one name node, and 8 data nodes (2
> had
> > 220GB hard disk space, rest had 450GB).
> > Most of the space on first machines with 250GB disk space was consumed.
> > Now we added two new machines each with 450GB hard disk space as data
> nodes.
> >
> > Is there any way to redistribute files on HDFS so that there will
> > considerable free space left on first two machines without
> > downloading the files to one local machine and then uploading it back on
> > HDFS?
> >
> > ~
> > Prashant,
> > SIEL,
> > IIIT-Hyderabad.
> >
>
>


-- 
Ted Dunning, CTO
DeepDyve

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message