hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jamal B <jm151...@gmail.com>
Subject RE: HDFS balance
Date Thu, 04 Sep 2014 10:42:39 GMT
Yes.  We do it all the time.

The node which you move this cron job to only needs to have the hadoop
environment set up, and proper connectivity to the cluster in which it is
writing to.
On Sep 3, 2014 10:51 AM, "John Lilley" <john.lilley@redpoint.net> wrote:

> Can you run the load from an "edge node" that is not a DataNode?
> john
>
> John Lilley
> Chief Architect, RedPoint Global Inc.
> 1515 Walnut Street | Suite 300 | Boulder, CO 80302
> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
> Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net
>
>
> -----Original Message-----
> From: Georgi Ivanov [mailto:ivanov@vesseltracker.com]
> Sent: Wednesday, September 03, 2014 1:56 AM
> To: user@hadoop.apache.org
> Subject: HDFS balance
>
> Hi,
> We have 11 nodes cluster.
> Every hour a cron job is started to upload one file( ~1GB) to Hadoop on
> node1. (plain hadoop fs -put)
>
> This way node1 is getting full because the first replica is always stored
> on the node where the command is executed.
> Every day i am running re-balance, but this seems to be not enough.
> The effect of this is :
> host1 4.7TB/5.3TB
> host[2-10] : 4.1/5.3
>
> So i am always out of space on host1.
>
> What i can do is , spread the job to all the nodes and execute the job on
> random host.
> I don't really like this solution as it involves some NFS mounts, security
> issues etc.
>
> Is there any better solution ?
>
> Thanks in advance.
> George
>
>

Mime
View raw message