hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Georgi Ivanov <iva...@vesseltracker.com>
Subject HDFS balance
Date Wed, 03 Sep 2014 07:55:31 GMT
We have 11 nodes cluster.
Every hour a cron job is started to upload one file( ~1GB) to Hadoop on
node1. (plain hadoop fs -put)

This way node1 is getting full because the first replica is always
stored on the node where the command is executed.
Every day i am running re-balance, but this seems to be not enough.
The effect of this is :
host1 4.7TB/5.3TB
host[2-10] : 4.1/5.3

So i am always out of space on host1.

What i can do is , spread the job to all the nodes and execute the job
on random host.
I don't really like this solution as it involves some NFS mounts,
security issues etc.

Is there any better solution ?

Thanks in advance.

View raw message