hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Lilley <john.lil...@redpoint.net>
Subject RE: HDFS balance
Date Wed, 03 Sep 2014 12:05:58 GMT
Can you run the load from an "edge node" that is not a DataNode?

John Lilley
Chief Architect, RedPoint Global Inc.
1515 Walnut Street | Suite 300 | Boulder, CO 80302
T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net

-----Original Message-----
From: Georgi Ivanov [mailto:ivanov@vesseltracker.com] 
Sent: Wednesday, September 03, 2014 1:56 AM
To: user@hadoop.apache.org
Subject: HDFS balance

We have 11 nodes cluster.
Every hour a cron job is started to upload one file( ~1GB) to Hadoop on node1. (plain hadoop
fs -put)

This way node1 is getting full because the first replica is always stored on the node where
the command is executed.
Every day i am running re-balance, but this seems to be not enough.
The effect of this is :
host1 4.7TB/5.3TB
host[2-10] : 4.1/5.3

So i am always out of space on host1.

What i can do is , spread the job to all the nodes and execute the job on random host.
I don't really like this solution as it involves some NFS mounts, security issues etc.

Is there any better solution ?

Thanks in advance.

View raw message