hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From prashant ullegaddi <prashullega...@gmail.com>
Subject Re: How to redistribute files on HDFS after adding new machines to cluster?
Date Sat, 08 Aug 2009 05:14:17 GMT
Sorry for the mistake in the previous mail. I meant I ran balancer with
default threshold.


On Sat, Aug 8, 2009 at 10:40 AM, prashant ullegaddi <
prashullegaddi@gmail.com> wrote:

> Thank you Ravi and Ted.
>
> I ran hadoop balancer without default threshold. It's been running for last
> 8 hours!
> How long does it take given the following DFS stats:
>
> *3140 files and directories, 10295 blocks = 13435 total. Heap Size is
> 17.88 MB / 963 MB (1%)
> *   Capacity : 3.93 TB DFS Remaining : 2.11 TB DFS Used : 1.31 TB DFS
> Used% : 33.44 % Live Nodes <http://megh01:50070/dfshealth.jsp#LiveNodes> :10 Dead
> Nodes <http://megh01:50070/dfshealth.jsp#DeadNodes>  : 0
>
> If I interrupt it now, what will happen? I've to run a job now. I think
> balancing and running a job
> may not happen together as one will slow down the other.
>
> Thanks,
> Prashant.
>
>
> On Fri, Aug 7, 2009 at 11:28 PM, Ted Dunning <ted.dunning@gmail.com>wrote:
>
>> Make sure you rebalance soon after adding the new node.  Otherwise, you
>> will
>> have an age bias in file distribution.  This can, in some applications,
>> lead
>> to some strange effects.  For example, if you have log files that you
>> delete
>> when they get too old, disk space will be freed non-uniformly.  This
>> shouldn't much affect performance, but it can lead to a need to rebalance
>> again (and again) later.  Normal file churn combined with occasional
>> rebalancing should eventually fix this, but it is nicer not to.
>>
>> On Fri, Aug 7, 2009 at 10:48 AM, Ravi Phulari <rphulari@gmail.com> wrote:
>>
>> > Use Rebalancer
>> >
>> >
>> >
>> http://hadoop.apache.org/common/docs/r0.20.0/hdfs_user_guide.html#Rebalancer
>> > -
>> > Ravi
>> >
>> > On 8/7/09 10:38 AM, "prashant ullegaddi" <prashullegaddi@gmail.com>
>> wrote:
>> >
>> > > Hi,
>> > >
>> > > We had a cluster of 9 machines with one name node, and 8 data nodes (2
>> > had
>> > > 220GB hard disk space, rest had 450GB).
>> > > Most of the space on first machines with 250GB disk space was
>> consumed.
>> > > Now we added two new machines each with 450GB hard disk space as data
>> > nodes.
>> > >
>> > > Is there any way to redistribute files on HDFS so that there will
>> > > considerable free space left on first two machines without
>> > > downloading the files to one local machine and then uploading it back
>> on
>> > > HDFS?
>> > >
>> > > ~
>> > > Prashant,
>> > > SIEL,
>> > > IIIT-Hyderabad.
>> > >
>> >
>> >
>>
>>
>> --
>> Ted Dunning, CTO
>> DeepDyve
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message