hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bertrand Dechoux <decho...@gmail.com>
Subject Re: Non local mapper .. Is it worth it?
Date Thu, 06 Dec 2012 09:39:29 GMT
The short answer is yes it can be worth it because your job can finish
faster if you are not only allowing local mappers. But this is of course a
trade off. The best performance (but not latency) can be obtained when
using only local mappers. You should read about delay scheduling which
allows the user to pick what is the 'best'. Fair scheduler has it for
hadoop 1 and capacity scheduler has it but for hadoop 2.

Regards

Bertrand

On Thu, Dec 6, 2012 at 6:14 AM, <jayunit100@gmail.com> wrote:

> If there is a job with files f1 and f2, and a Mapper (m1) is running
> against a file (f2) which is far from the local machine(m1), will the
> overhead of copying f2 over to m1 be worth it?.
>
> That is .... - is the amount of resources required to read data off a
> remote machine (m2)  worth it? Or would it be better if that remote (m2)
> now simply processed both files (f1, f2) in turn?
>
> Jay Vyas
> http://jayunit100.blogspot.com




-- 
Bertrand Dechoux

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message