Return-Path: Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: (qmail 69023 invoked from network); 12 Jan 2011 21:50:51 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 12 Jan 2011 21:50:51 -0000 Received: (qmail 57888 invoked by uid 500); 12 Jan 2011 21:50:51 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 57800 invoked by uid 500); 12 Jan 2011 21:50:50 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 57792 invoked by uid 99); 12 Jan 2011 21:50:50 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Jan 2011 21:50:50 +0000 X-ASF-Spam-Status: No, hits=2.9 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [216.145.54.173] (HELO mrout3.yahoo.com) (216.145.54.173) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Jan 2011 21:50:43 +0000 Received: from wlanvpn-snve-246-69.corp.yahoo.com (wlanvpn-snve-246-69.corp.yahoo.com [172.21.148.69]) by mrout3.yahoo.com (8.13.8/8.13.8/y.out) with ESMTP id p0CLo1AZ018398 for ; Wed, 12 Jan 2011 13:50:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=yahoo-inc.com; s=cobra; t=1294869009; bh=XVX4Rx58I1xUn835McM5ksU1oQsQyrjQUoYXS5r0lqQ=; h=Message-Id:From:To:In-Reply-To:Content-Type:Mime-Version:Subject: Date:References; b=bzvy0ldXtkyo/aZSDVInEDYTSF1yRp1ePhL5aFVJqp3W2VyMWC0ggxt3cte1GPb5Q tfULPdeotRn3t64iIabpKhIvUjigEW5UuPRw4WHlHWsu1Hd1T/pVrm03Q7u/98DY5v C3uvyXIBfeflrz5LwNc/0vaKwh+FbeTt5fUC86Ok= Message-Id: <5AE5DE5F-397C-4742-923E-C54446D65FC7@yahoo-inc.com> From: Arun C Murthy To: "mapreduce-user@hadoop.apache.org" In-Reply-To: Content-Type: multipart/alternative; boundary=Apple-Mail-53-881706674 Mime-Version: 1.0 (Apple Message framework v936) Subject: Re: How do hadoop work in details Date: Wed, 12 Jan 2011 13:50:02 -0800 References: <33A3A799-BED3-434C-9409-D49DEA073105@yahoo-inc.com> X-Mailer: Apple Mail (2.936) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-53-881706674 Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit The approach I run for Yahoo, for pretty much the same use case, is to use the CapacityScheduler and define two queues: production adhoc Lets say you have 30% as the capacity you want for production, rest for adhoc. production could have 70% capacity, max-limit of 100 adhoc can have 30% capacity, max-limit of 70/80. This way adhoc jobs can take upto 70-80% of the cluster, but save some for 'production' jobs at all times. You get the idea? I'm sure there are similar tricks for FairScheduler, I just am not familiar enough with it. I'll warn you that I only run Yahoo clusters, we use the CS everywhere. One other note: I'm in bang in the middle of releasing extensive enhancements to CapacityScheduler via hadoop-0.20.100 or whatever we decide to call it: http://www.mail-archive.com/general@hadoop.apache.org/msg02670.html Arun On Jan 12, 2011, at 9:40 AM, felix gao wrote: > Arun, > > The information is very helpful. What scheduler do you suggest to > when we have mixed of production and adhoc jobs are running the same > time using pig and we would like to guarantee the SLA for production > task. > > Thanks, > > Felix > > On Sun, Jan 9, 2011 at 12:35 AM, Arun C Murthy > wrote: > > On Dec 29, 2010, at 2:43 PM, felix gao wrote: > > Hi all, > > I am trying to figure out how exactly happens inside the job. > > 1) When the jobtracker launches a task to be run, how does it impact > the currently running jobs if the the current running job have > higher, same, or lower priories using the default queue. > > 2) What if a low priority job is running that is holding all the > reducer slots and the mappers are halfway done and a high priority > job comes in took all the mappers but cannot complete but all the > reducer slots are taken by the low priority job? > > > Both 1) and 2) really depends on the scheduler you are using - > Default, FairShare or CapacityScheduler. > > With the Default scheduler, 2) is a real problem. The > CapacityScheduler doesn't allow priorities within the same queue for > precisely the same reason since it doesn't have preemption. I'm not > sure if FairScheduler handles it. > > > 3) when is mappers allocated on the slaves, and when is reducers > allocated. > > > Usually, reduces are allocated only after a certain percentage of > maps are complete (5% by default). Use > mapred.reduce.slowstart.completed.maps to control this. Look at > JobInProgress.java. > > > 4)Does mappers pass all the data to reducers using RPC or they write > their output to HDFS and the reducers pick it up. > > > Maps sort/combine their output and write to local-disk. The reduces > then copy them (we call it the 'shuffle' phase) over http. The TT on > which the map ran will serve the map's output via an embedded > webserver. Look at ReduceTask.java and > TaskTracker.MapOutputServelt.doGet. > > > 5) within a job, when and where is all the io occurs. > > > Typically input to map i.e. InputFormat and output of reduce i.e. > OutputFormat. Look at MapTask.java and ReduceTask.java. > > hope this helps, > Arun > > > > I know this seems to be a lot of low level questions , if you can > point me to the right place to look is should be enough. > > Thanks, > > Felix > > > --Apple-Mail-53-881706674 Content-Type: text/html; charset=US-ASCII Content-Transfer-Encoding: quoted-printable The approach I run for Yahoo, = for pretty much the same use case, is to use the CapacityScheduler and = define two = queues:
production
adhoc

Lets say = you have 30% as the capacity you want for production, rest for = adhoc.
production could have 70% capacity, max-limit of = 100 
adhoc can have 30% capacity, max-limit of = 70/80.

This way adhoc jobs can take upto 70-80% = of the cluster, but save some for 'production' jobs at all times. You = get the idea?

I'm sure there are similar tricks = for FairScheduler, I just am not familiar enough with it. I'll warn you = that I only run Yahoo clusters, we use the CS = everywhere.

One other note: I'm in bang in the = middle of releasing extensive enhancements to CapacityScheduler via = hadoop-0.20.100 or whatever we decide to call = it: 


Arun

On Jan 12, 2011, = at 9:40 AM, felix gao wrote:

Arun,
 
The information is very helpful. =  What scheduler do you suggest to when we have mixed of = production and adhoc jobs are running the same time using pig and we = would like to guarantee the SLA for production task.
=

Thanks,

Felix

On Sun, Jan 9, 2011 at 12:35 AM, Arun C Murthy = <acm@yahoo-inc.com> = wrote:

= On Dec 29, 2010, at 2:43 PM, felix gao wrote:

Hi all,

I am trying to figure out how = exactly happens inside the job.

1) When the jobtracker launches = a task to be run, how does it impact the currently running jobs if the = the current running job have higher, same, or lower priories using the = default queue.

2) What if a low priority job is running that is = holding all the reducer slots and the mappers are halfway done and a = high priority job comes in took all the mappers but cannot complete but = all the reducer slots are taken by the low priority job?

=

Both 1) and 2) really depends on the scheduler = you are using - Default, FairShare or CapacityScheduler.

With = the Default scheduler, 2) is a real problem. The CapacityScheduler = doesn't allow priorities within the same queue for precisely the same = reason since it doesn't have preemption. I'm not sure if FairScheduler = handles it.


= 3) when is mappers allocated on the slaves, and when is reducers = allocated.


Usually, reduces are = allocated only after a certain percentage of maps are complete (5% by = default). Use mapred.reduce.slowstart.completed.maps to control this. = Look at JobInProgress.java.


4)Does mappers pass all the data to reducers = using RPC or they write their output to HDFS and the reducers pick it = up.


Maps sort/combine their output and = write to local-disk. The reduces then copy them (we call it the = 'shuffle' phase) over http. The TT on which the map ran will serve the = map's output via an embedded webserver. Look at ReduceTask.java and = TaskTracker.MapOutputServelt.doGet.


=
5) within a job, when = and where is all the io occurs.


= Typically input to map i.e. InputFormat and output of reduce i.e. = OutputFormat. Look at MapTask.java and ReduceTask.java.

hope = this helps,
Arun



I know this = seems to be a lot of low level questions , if you can point me to the = right place to look is should be enough.

Thanks,

= Felix


=


= --Apple-Mail-53-881706674--