hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From AnilKumar B <akumarb2...@gmail.com>
Subject Re: HDFS balance
Date Wed, 03 Sep 2014 08:01:33 GMT
Better to create one client/gateway node(where no DN is running) and
schedule your cron from that machine.

Thanks & Regards,
B Anil Kumar.


On Wed, Sep 3, 2014 at 1:25 PM, Georgi Ivanov <ivanov@vesseltracker.com>
wrote:

> Hi,
> We have 11 nodes cluster.
> Every hour a cron job is started to upload one file( ~1GB) to Hadoop on
> node1. (plain hadoop fs -put)
>
> This way node1 is getting full because the first replica is always
> stored on the node where the command is executed.
> Every day i am running re-balance, but this seems to be not enough.
> The effect of this is :
> host1 4.7TB/5.3TB
> host[2-10] : 4.1/5.3
>
> So i am always out of space on host1.
>
> What i can do is , spread the job to all the nodes and execute the job
> on random host.
> I don't really like this solution as it involves some NFS mounts,
> security issues etc.
>
> Is there any better solution ?
>
> Thanks in advance.
> George
>
>

Mime
View raw message