Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of akumarb2010@gmail.com
 designates 209.85.192.48 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <1409730931.12003.10.camel@georgi-ThinkCentre-M92p>
References: <1409730931.12003.10.camel@georgi-ThinkCentre-M92p>
Date: Wed, 3 Sep 2014 13:31:33 +0530
Message-ID: 
 <CAHdEuM2+inA3ptAWpFmLiAamNUzv1T5tvTY6hph-Kk01O+beLA@mail.gmail.com>
Subject: Re: HDFS balance
From: AnilKumar B <akumarb2010@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=001a113a921af4ebf0050224a456

--001a113a921af4ebf0050224a456
Content-Type: text/plain; charset=UTF-8

Better to create one client/gateway node(where no DN is running) and
schedule your cron from that machine.

Thanks & Regards,
B Anil Kumar.


On Wed, Sep 3, 2014 at 1:25 PM, Georgi Ivanov <ivanov@vesseltracker.com>
wrote:

> Hi,
> We have 11 nodes cluster.
> Every hour a cron job is started to upload one file( ~1GB) to Hadoop on
> node1. (plain hadoop fs -put)
>
> This way node1 is getting full because the first replica is always
> stored on the node where the command is executed.
> Every day i am running re-balance, but this seems to be not enough.
> The effect of this is :
> host1 4.7TB/5.3TB
> host[2-10] : 4.1/5.3
>
> So i am always out of space on host1.
>
> What i can do is , spread the job to all the nodes and execute the job
> on random host.
> I don't really like this solution as it involves some NFS mounts,
> security issues etc.
>
> Is there any better solution ?
>
> Thanks in advance.
> George
>
>

--001a113a921af4ebf0050224a456
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Better to create one client/gateway node(where no DN is ru=
nning) and schedule your cron from that machine.</div><div class=3D"gmail_e=
xtra"><br clear=3D"all"><div>Thanks &amp; Regards,<br>B Anil Kumar.<br></di=
v>
<br><br><div class=3D"gmail_quote">On Wed, Sep 3, 2014 at 1:25 PM, Georgi I=
vanov <span dir=3D"ltr">&lt;<a href=3D"mailto:ivanov@vesseltracker.com" tar=
get=3D"_blank">ivanov@vesseltracker.com</a>&gt;</span> wrote:<br><blockquot=
e class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc sol=
id;padding-left:1ex">Hi,<br>
We have 11 nodes cluster.<br>
Every hour a cron job is started to upload one file( ~1GB) to Hadoop on<br>
node1. (plain hadoop fs -put)<br>
<br>
This way node1 is getting full because the first replica is always<br>
stored on the node where the command is executed.<br>
Every day i am running re-balance, but this seems to be not enough.<br>
The effect of this is :<br>
host1 4.7TB/5.3TB<br>
host[2-10] : 4.1/5.3<br>
<br>
So i am always out of space on host1.<br>
<br>
What i can do is , spread the job to all the nodes and execute the job<br>
on random host.<br>
I don&#39;t really like this solution as it involves some NFS mounts,<br>
security issues etc.<br>
<br>
Is there any better solution ?<br>
<br>
Thanks in advance.<br>
George<br>
<br>
</blockquote></div><br></div>

--001a113a921af4ebf0050224a456--