hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Vyas <jayunit...@gmail.com>
Subject Re: Non local mapper .. Is it worth it?
Date Thu, 06 Dec 2012 11:09:22 GMT
Hmmmm.... but How can the scheduler effect the performance of a Mapper if there are no competing
jobs?

I thought the scheduler only impacted the way separate jobs got resources for different jobs.
In my example, there are 2 mappers, 2+n files, and 1 job.

Jay Vyas 
http://jayunit100.blogspot.com

On Dec 6, 2012, at 4:39 AM, Bertrand Dechoux <dechouxb@gmail.com> wrote:

> The short answer is yes it can be worth it because your job can finish
> faster if you are not only allowing local mappers. But this is of course a
> trade off. The best performance (but not latency) can be obtained when
> using only local mappers. You should read about delay scheduling which
> allows the user to pick what is the 'best'. Fair scheduler has it for
> hadoop 1 and capacity scheduler has it but for hadoop 2.
> 
> Regards
> 
> Bertrand
> 
> On Thu, Dec 6, 2012 at 6:14 AM, <jayunit100@gmail.com> wrote:
> 
>> If there is a job with files f1 and f2, and a Mapper (m1) is running
>> against a file (f2) which is far from the local machine(m1), will the
>> overhead of copying f2 over to m1 be worth it?.
>> 
>> That is .... - is the amount of resources required to read data off a
>> remote machine (m2)  worth it? Or would it be better if that remote (m2)
>> now simply processed both files (f1, f2) in turn?
>> 
>> Jay Vyas
>> http://jayunit100.blogspot.com
> 
> 
> 
> 
> -- 
> Bertrand Dechoux

Mime
View raw message