hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kai Voigt...@123.org>
Subject Re: 100K Maps scenario
Date Sat, 13 Apr 2013 01:48:38 GMT
No, only one copy of each block will be processed.

If a task fails, it will be retried on another copy. Also, if speculative execution is enabled,
slow tasks might be executed twice in parallel. But this will only happen rarely.


Am 12.04.2013 um 18:45 schrieb Sai Sai <saigraph@yahoo.in>:

> Just a follow up to see if anyone can shed some light on this:
> My understanding is that each block after getting replicated 3 times, a map task is run
on each of the replica in parallel.
> The thing i am trying to double verify is in a scenario where a file is split into 10K
or 100K or more blocks it will result in atleast 300K Map tasks being performed and this looks
like an overkill from a performance or just a logical perspective. 
> Will appreciate any thoughts on this.
> Thanks
> Sai
> From: Sai Sai <saigraph@yahoo.in>
> To: "user@hadoop.apache.org" <user@hadoop.apache.org>; Sai Sai <saigraph@yahoo.in>

> Sent: Friday, 12 April 2013 1:37 PM
> Subject: Re: Does a Map task run 3 times on 3 TTs or just once
> Just wondering if it is right to assume that a Map task is run 3 times on 3 different
TTs in parallel and whoever completes processing the task first that output is picked up and
written to intermediate location.
> Or is it true that a map task even though its data is replicated 3 times will run only
once and other 2 will be on the stand by just incase this fails the second one will run followed
by 3rd one if the 2nd Mapper fails.
> Plesae pour some light.
> Thanks
> Sai

Kai Voigt

View raw message