hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: hdfs space problem.
Date Mon, 09 Aug 2010 09:37:18 GMT
On 05/08/10 19:28, Raj V wrote:
> Thank you. I realized that I was running the datanode on the namenode and
> stopped it, but did not know that the first copy went to the local node.
>
> Raj

It's a placement decision that makes sense for code running as MR jobs, 
ensuring that the output of work goes to the local machine and not 
somewhere random, but on big imports like your's you get penalised.

Some datacentres have one or two IO nodes in the cluster that aren't 
running hadoop HDFS or task trackers, but let you get at the data at 
full datacentre rates, just to help with these kind of problems. 
Otherwies, if you can implement your import as a MapReduce job, Hadoop 
can do the work for you

-steve

Mime
View raw message