hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sai Sai <saigr...@yahoo.in>
Subject Re: 100K Maps scenario
Date Sat, 13 Apr 2013 01:45:28 GMT

Just a follow up to see if anyone can shed some light on this:
My understanding is that each block after getting replicated 3 times, a map task is run on
each of the replica in parallel.
The thing i am trying to double verify is in a scenario where a file is split into 10K or
100K or more blocks it will result in atleast 300K Map tasks being performed and this looks
like an overkill from a performance or just a logical perspective. 
Will appreciate any thoughts on this.

 From: Sai Sai <saigraph@yahoo.in>
To: "user@hadoop.apache.org" <user@hadoop.apache.org>; Sai Sai <saigraph@yahoo.in>

Sent: Friday, 12 April 2013 1:37 PM
Subject: Re: Does a Map task run 3 times on 3 TTs or just once

Just wondering if it is right to assume that a Map task is run 3 times on 3 different TTs
in parallel and whoever completes processing the task first that output is picked up and written
to intermediate location.
Or is it true that a map task even though its data is replicated 3 times will run only once
and other 2 will be on the stand by just incase this fails the second one will run followed
by 3rd one if the 2nd Mapper fails.
Plesae pour some light.
View raw message