hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Gandhi <gopal.gandhi2...@yahoo.com>
Subject Re: question on HDFS
Date Wed, 23 Jul 2008 01:30:49 GMT
That's interesting. Why letting reducer fetch local data through HTTP not SSH?



----- Original Message ----
From: Arun C Murthy <acm@yahoo-inc.com>
To: core-user@hadoop.apache.org
Sent: Tuesday, July 22, 2008 2:19:36 PM
Subject: Re: question on HDFS

Mori,

On Jul 22, 2008, at 12:22 PM, Mori Bellamy wrote:

> hey all,
> let us say that i have 3 boxes, A B and C. initially, map tasks are  
> running on all 3. after most of the mapping is done, C is 32% done  
> with reduce (so still copying stuff to its local disk) and A is  
> stuck on a particularly long map-task (it got an ill-behaved record  
> from the input splits). does A's intermediate map output data go  
> directly to C's local disk, or is it still written to HDFS and  
> therefore distributed amongst all the machines? also, will A's disk  
> be a favored target for A's output bytes, or is the target volume  
> independent of the corresponding mapper?
>

Intermediate outputs (i.e. map outputs) are written to the local disk  
and not to HDFS. The reduce fetches the intermediate outputs via HTTP.

hth,
Arun

> Thanks! The answer to this question should clear a lot of things up  
> for me.


      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message