Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: mapreduce-user@hadoop.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
Message-Id: <5AE5DE5F-397C-4742-923E-C54446D65FC7@yahoo-inc.com>
From: Arun C Murthy <acm@yahoo-inc.com>
To: "mapreduce-user@hadoop.apache.org" <mapreduce-user@hadoop.apache.org>
In-Reply-To: <AANLkTinrM3_2h_12EkeQ3YHKVk8ZnWhdutRz4RmwUy=h@mail.gmail.com>
Content-Type: multipart/alternative; boundary=Apple-Mail-53-881706674
Mime-Version: 1.0 (Apple Message framework v936)
Subject: Re: How do hadoop work in details
Date: Wed, 12 Jan 2011 13:50:02 -0800
References: <AANLkTikxN4Hfzyt5DDo6Bnt1=5rhD22JK53SLwkZTB8e@mail.gmail.com>
 <33A3A799-BED3-434C-9409-D49DEA073105@yahoo-inc.com>
 <AANLkTinrM3_2h_12EkeQ3YHKVk8ZnWhdutRz4RmwUy=h@mail.gmail.com>


--Apple-Mail-53-881706674
Content-Type: text/plain;
	charset=US-ASCII;
	format=flowed;
	delsp=yes
Content-Transfer-Encoding: 7bit

The approach I run for Yahoo, for pretty much the same use case, is to  
use the CapacityScheduler and define two queues:
production
adhoc

Lets say you have 30% as the capacity you want for production, rest  
for adhoc.
production could have 70% capacity, max-limit of 100
adhoc can have 30% capacity, max-limit of 70/80.

This way adhoc jobs can take upto 70-80% of the cluster, but save some  
for 'production' jobs at all times. You get the idea?

I'm sure there are similar tricks for FairScheduler, I just am not  
familiar enough with it. I'll warn you that I only run Yahoo clusters,  
we use the CS everywhere.

One other note: I'm in bang in the middle of releasing extensive  
enhancements to CapacityScheduler via hadoop-0.20.100 or whatever we  
decide to call it:

http://www.mail-archive.com/general@hadoop.apache.org/msg02670.html

Arun

On Jan 12, 2011, at 9:40 AM, felix gao wrote:

> Arun,
>
> The information is very helpful.  What scheduler do you suggest to  
> when we have mixed of production and adhoc jobs are running the same  
> time using pig and we would like to guarantee the SLA for production  
> task.
>
> Thanks,
>
> Felix
>
> On Sun, Jan 9, 2011 at 12:35 AM, Arun C Murthy <acm@yahoo-inc.com>  
> wrote:
>
> On Dec 29, 2010, at 2:43 PM, felix gao wrote:
>
> Hi all,
>
> I am trying to figure out how exactly happens inside the job.
>
> 1) When the jobtracker launches a task to be run, how does it impact  
> the currently running jobs if the the current running job have  
> higher, same, or lower priories using the default queue.
>
> 2) What if a low priority job is running that is holding all the  
> reducer slots and the mappers are halfway done and a high priority  
> job comes in took all the mappers but cannot complete but all the  
> reducer slots are taken by the low priority job?
>
>
> Both 1) and 2) really depends on the scheduler you are using -  
> Default, FairShare or CapacityScheduler.
>
> With the Default scheduler, 2) is a real problem. The  
> CapacityScheduler doesn't allow priorities within the same queue for  
> precisely the same reason since it doesn't have preemption. I'm not  
> sure if FairScheduler handles it.
>
>
> 3) when is mappers allocated on the slaves, and when is reducers  
> allocated.
>
>
> Usually, reduces are allocated only after a certain percentage of  
> maps are complete (5% by default). Use  
> mapred.reduce.slowstart.completed.maps to control this. Look at  
> JobInProgress.java.
>
>
> 4)Does mappers pass all the data to reducers using RPC or they write  
> their output to HDFS and the reducers pick it up.
>
>
> Maps sort/combine their output and write to local-disk. The reduces  
> then copy them (we call it the 'shuffle' phase) over http. The TT on  
> which the map ran will serve the map's output via an embedded  
> webserver. Look at ReduceTask.java and  
> TaskTracker.MapOutputServelt.doGet.
>
>
> 5) within a job, when and where is all the io occurs.
>
>
> Typically input to map i.e. InputFormat and output of reduce i.e.  
> OutputFormat. Look at MapTask.java and ReduceTask.java.
>
> hope this helps,
> Arun
>
>
>
> I know this seems to be a lot of low level questions , if you can  
> point me to the right place to look is should be enough.
>
> Thanks,
>
> Felix
>
>
>


--Apple-Mail-53-881706674
Content-Type: text/html;
	charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

<html><body style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; ">The approach I run for Yahoo, =
for pretty much the same use case, is to use the CapacityScheduler and =
define two =
queues:<div>production</div><div>adhoc</div><div><br></div><div>Lets say =
you have 30% as the capacity you want for production, rest for =
adhoc.</div><div>production could have 70% capacity, max-limit of =
100&nbsp;</div><div>adhoc can have 30% capacity, max-limit of =
70/80.</div><div><br></div><div>This way adhoc jobs can take upto 70-80% =
of the cluster, but save some for 'production' jobs at all times. You =
get the idea?</div><div><br></div><div>I'm sure there are similar tricks =
for FairScheduler, I just am not familiar enough with it. I'll warn you =
that I only run Yahoo clusters, we use the CS =
everywhere.</div><div><br></div><div>One other note: I'm in bang in the =
middle of releasing extensive enhancements to CapacityScheduler via =
hadoop-0.20.100 or whatever we decide to call =
it:&nbsp;</div><div><br></div><div><a =
href=3D"http://www.mail-archive.com/general@hadoop.apache.org/msg02670.htm=
l">http://www.mail-archive.com/general@hadoop.apache.org/msg02670.html</a>=
</div><div><br></div><div>Arun</div><div><br><div><div>On Jan 12, 2011, =
at 9:40 AM, felix gao wrote:</div><br =
class=3D"Apple-interchange-newline"><blockquote =
type=3D"cite">Arun,<div>&nbsp;<div>The information is very helpful. =
&nbsp;What scheduler do you&nbsp;suggest&nbsp;to when we have mixed of =
production and adhoc jobs are running the same time using pig and we =
would like to guarantee the SLA for production task.</div> =
<div><br></div><div>Thanks,</div><div><br></div><div>Felix<br><br><div =
class=3D"gmail_quote">On Sun, Jan 9, 2011 at 12:35 AM, Arun C Murthy =
<span dir=3D"ltr">&lt;<a =
href=3D"mailto:acm@yahoo-inc.com">acm@yahoo-inc.com</a>&gt;</span> =
wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 =
.8ex;border-left:1px #ccc solid;padding-left:1ex;"><div class=3D"im"><br> =
On Dec 29, 2010, at 2:43 PM, felix gao wrote:<br> <br> <blockquote =
class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc =
solid;padding-left:1ex"> Hi all,<br> <br> I am trying to figure out how =
exactly happens inside the job.<br> <br> 1) When the jobtracker launches =
a task to be run, how does it impact the currently running jobs if the =
the current running job have higher, same, or lower priories using the =
default queue.<br> <br> 2) What if a low priority job is running that is =
holding all the reducer slots and the mappers are halfway done and a =
high priority job comes in took all the mappers but cannot complete but =
all the reducer slots are taken by the low priority job?<br> <br> =
</blockquote> <br></div> Both 1) and 2) really depends on the scheduler =
you are using - Default, FairShare or CapacityScheduler.<br> <br> With =
the Default scheduler, 2) is a real problem. The CapacityScheduler =
doesn't allow priorities within the same queue for precisely the same =
reason since it doesn't have preemption. I'm not sure if FairScheduler =
handles it.<div class=3D"im"> <br> <br> <blockquote class=3D"gmail_quote" =
style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> =
3) when is mappers allocated on the slaves, and when is reducers =
allocated.<br> <br> </blockquote> <br></div> Usually, reduces are =
allocated only after a certain percentage of maps are complete (5% by =
default). Use mapred.reduce.slowstart.completed.maps to control this. =
Look at JobInProgress.java.<div class=3D"im"><br> <br> <blockquote =
class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc =
solid;padding-left:1ex"> 4)Does mappers pass all the data to reducers =
using RPC or they write their output to HDFS and the reducers pick it =
up.<br> <br> </blockquote> <br></div> Maps sort/combine their output and =
write to local-disk. The reduces then copy them (we call it the =
'shuffle' phase) over http. The TT on which the map ran will serve the =
map's output via an embedded webserver. Look at ReduceTask.java and =
TaskTracker.MapOutputServelt.doGet.<div class=3D"im"> <br> <br> =
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 =
.8ex;border-left:1px #ccc solid;padding-left:1ex"> 5) within a job, when =
and where is all the io occurs.<br> <br> </blockquote> <br></div> =
Typically input to map i.e. InputFormat and output of reduce i.e. =
OutputFormat. Look at MapTask.java and ReduceTask.java.<br> <br> hope =
this helps,<br><font color=3D"#888888"> Arun</font><div><div></div><div =
class=3D"h5"><br> <br> <blockquote class=3D"gmail_quote" style=3D"margin:0=
 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <br> I know this =
seems to be a lot of low level questions , if you can point me to the =
right place to look is should be enough.<br> <br> Thanks,<br> <br> =
Felix<br> <br> </blockquote> <br> =
</div></div></blockquote></div><br></div></div></blockquote></div><br></di=
v></body></html>=

--Apple-Mail-53-881706674--