hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vitaliy Semochkin <vitaliy...@gmail.com>
Subject Re: hdfs system crashes when loading files bigger than local space left
Date Fri, 16 Jul 2010 10:15:59 GMT
On Thu, Jul 15, 2010 at 9:26 PM, Allen Wittenauer
<awittenauer@linkedin.com>wrote:

>
> On Jul 15, 2010, at 1:11 AM, Vitaliy Semochkin wrote:
>
> > >a) Have you set a reserved size for hdfs?
> > Yes. I set 128Mb as reserved size.
>
> That is likely way too small.

Will setting 512Mb be better in case the whole volume size is only 190Gb?


> > b) Are you loading data from the datanode?
> > Yes. But the datanode is running on same node as namenode (i have very
> small cluster, only 5 servers and wasting one node only for
> namenode/jobtracker seemed unreasonable to me)
>
> Where the NN is running is irrelevant to this particular problem.
>
> The problem is that if you start your data load on a machine also running a
> datanode process, the data will get put onto that node first.  This will
> cause your DFS to be majorly unbalanced.
>
> It is much better to load the data from another host outside the grid.
>

Does hadoop detect/distinct the client that uploads data from datanode and
not from datanode?
lets say I execute

hadoop -put someFile hdfs://namenode.mycompany.com/

from namenode.mycompany.com and from some other pc. Will it be any different
for hadoop and will hadoop orgonize data more balanced in the last case?

Thank you very much for replies,
Vitaliy S

Mime
View raw message