Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 66C2C102E9 for ; Sat, 23 Nov 2013 19:17:13 +0000 (UTC) Received: (qmail 5070 invoked by uid 500); 23 Nov 2013 19:17:08 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 4931 invoked by uid 500); 23 Nov 2013 19:17:07 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 4924 invoked by uid 99); 23 Nov 2013 19:17:07 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 23 Nov 2013 19:17:07 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of john.lilley@redpoint.net designates 206.225.164.219 as permitted sender) Received: from [206.225.164.219] (HELO hub021-nj-4.exch021.serverdata.net) (206.225.164.219) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 23 Nov 2013 19:17:03 +0000 Received: from MBX021-E3-NJ-2.exch021.domain.local ([10.240.4.78]) by HUB021-NJ-4.exch021.domain.local ([10.240.4.39]) with mapi id 14.03.0158.001; Sat, 23 Nov 2013 11:16:41 -0800 From: John Lilley To: "user@hadoop.apache.org" Subject: RE: In YARN, how does a task tracker knows the address of a job tracker? Thread-Topic: In YARN, how does a task tracker knows the address of a job tracker? Thread-Index: AQHO5tG6/Y31UCHZhkmozxdZoCVdrZowkhOAgAAIHgD//4dOwIAArD+AgAJkP2A= Date: Sat, 23 Nov 2013 19:16:40 +0000 Message-ID: <869970D71E26D7498BDAC4E1CA92226B86DC6EC0@MBX021-E3-NJ-2.exch021.domain.local> References: <869970D71E26D7498BDAC4E1CA92226B86DC4D3F@MBX021-E3-NJ-2.exch021.domain.local> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [173.160.43.60] Content-Type: multipart/alternative; boundary="_000_869970D71E26D7498BDAC4E1CA92226B86DC6EC0MBX021E3NJ2exch_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_869970D71E26D7498BDAC4E1CA92226B86DC6EC0MBX021E3NJ2exch_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Ricky, What you are doing sounds familiar. We are in the process of implementing,= not exactly MapReduce, but a system that has to do many of the things that= MapReduce does (find data splits, define tasks, choose execution affinity,= launch an app master, etc) There is another special thing that MapReduce under YARN does that a normal= YARN app cannot easily access, which are "auxiliary services". MapReduce = sets up a YARN auxiliary service to serve up the results of mapper outputs.= I think it is based on netty or jetty and HTTP. The point is, that the M= R aux service is part of the Hadoop distro, so all MR has to do is tell the= NM to run it. Regular YARN apps don't have this luxury without installing= jars on each node and adding them to the hadoop stack's CLASSPATH. There = doesn't appear to be any standard or documented way to inject extra jars in= to the hadoop install. As they say, that exercise is left to the reader. john From: ricky l [mailto:rickylee0815@gmail.com] Sent: Thursday, November 21, 2013 3:40 PM To: user@hadoop.apache.org Subject: Re: In YARN, how does a task tracker knows the address of a job tr= acker? Hi John, thanks for your reply. I suspect there will be some external commu= nication between AM and container tasks. I am trying to implement a Hadoop-= like system to Yarn and I wanted to draw a high-level steps before starting= the work. thanks, On Thu, Nov 21, 2013 at 3:27 PM, John Lilley > wrote: MapReduce also communicates outside of what is directly supported by YARN. In a YARN application, there is very little direct communication between th= e client and the AM, and between the AM and container tasks. I think that an AM can update to the client two pieces of information -- "s= tate" and "percent complete". However, at launch time an AM can open up a protocol port and tell the clie= nt and the container tasks how to connect back. I don't know the details, but I believe that the MapReduce AM communicates = directly with all mapper, reducer tasks as well as the client. John From: ricky l [mailto:rickylee0815@gmail.com= ] Sent: Thursday, November 21, 2013 12:36 PM To: user@hadoop.apache.org Subject: Re: In YARN, how does a task tracker knows the address of a job tr= acker? Thank you for the answer, Omkar. I read the links that were helpful. Though the concept of job tracker/task = tracker does not exist in the YARN MapReduce, doesn't it use the binary of = job/task tracker? I though the application master runs job tracker binary a= nd the containers in the node will run task tracker binary. thx On Thu, Nov 21, 2013 at 2:06 PM, Omkar Joshi > wrote: Hi, Starting with YARN there is no notion of job tracker and task tracker. Here= is a quick summary JobTracker :- 1) Resource management :- Now done by Resource Manager (it does all schedul= ing work) 2) Application state management :- managing and launching new map /reduce t= asks (done by Application Master .. It is per job not one single entity in = the cluster for all jobs like MRv1). TaskTracker :- replaced by Node Manager I would suggest you read the YARN blog post. This will answer most of your quest= ions. Plus read this (slide 12) for how job actually gets executed. Thanks, Omkar Joshi Hortonworks Inc. On Thu, Nov 21, 2013 at 7:52 AM, ricky l > wrote: Hi all, I have a question of how a task tracker identifies job tracker address when= I submit MR job through YARN. As far as I know, both job tracker and task = trackers are launched through application master and I am curious about the= details about job and task tracker launch sequence. thanks. CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to= which it is addressed and may contain information that is confidential, pr= ivileged and exempt from disclosure under applicable law. If the reader of = this message is not the intended recipient, you are hereby notified that an= y printing, copying, dissemination, distribution, disclosure or forwarding = of this communication is strictly prohibited. If you have received this com= munication in error, please contact the sender immediately and delete it fr= om your system. Thank You. --_000_869970D71E26D7498BDAC4E1CA92226B86DC6EC0MBX021E3NJ2exch_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Ricky,<= /p>

 <= /p>

What you are doing sounds= familiar.  We are in the process of implementing, not exactly MapRedu= ce, but a system that has to do many of the things that MapReduce does (find data splits, define tasks, choose execution affinity, launch an= app master, etc)

 <= /p>

There is another special = thing that MapReduce under YARN does that a normal YARN app cannot easily a= ccess, which are “auxiliary services”.  MapReduce sets up a YARN auxiliary service to serve up the results of mapper outputs.  = I think it is based on netty or jetty and HTTP.  The point is, that th= e MR aux service is part of the Hadoop distro, so all MR has to do is tell = the NM to run it.  Regular YARN apps don’t have this luxury without installing jars on each node and adding them to t= he hadoop stack’s CLASSPATH.  There doesn’t appear to be a= ny standard or documented way to inject extra jars into the hadoop install.=   As they say, that exercise is left to the reader.<= /p>

 <= /p>

john

 <= /p>

From: ricky l = [mailto:rickylee0815@gmail.com]
Sent: Thursday, November 21, 2013 3:40 PM
To: user@hadoop.apache.org
Subject: Re: In YARN, how does a task tracker knows the address of a= job tracker?

 

Hi John, thanks for your reply. I suspect there will= be some external communication between AM and container tasks. I am trying= to implement a Hadoop-like system to Yarn and I wanted to draw a high-leve= l steps before starting the work. thanks,

 

 

On Thu, Nov 21, 2013 at 3:27 PM, John Lilley <john.lilley@redp= oint.net> wrote:

MapReduce also communicates outside of = what is directly supported by YARN.

In a YARN application, there is very li= ttle direct communication between the client and the AM, and between the AM and container tasks.

I think that an AM can update to the cl= ient two pieces of information -- “state” and “percent co= mplete”.

However, at launch time an AM can open = up a protocol port and tell the client and the container tasks how to connect back.

I don’t know the details, but I b= elieve that the MapReduce AM communicates directly with all mapper, reducer tasks as well as the client.

John

 

 

From: ricky l [mailto:rickylee0815@gmail.co= m]
Sent: Thursday, November 21, 2013 12:36 PM
To: user= @hadoop.apache.org
Subject: Re: In YARN, how does a task tracker knows the address of a= job tracker?

 

Thank you for the answer, Omkar.

 

I read the links that were helpful. Though the concept of job trac= ker/task tracker does not exist in the YARN MapReduce, doesn't it use the b= inary of job/task tracker? I though the application master runs job tracker binary and the containers in the n= ode will run task tracker binary. thx

 

On Thu, Nov 21, 2013 at 2:06 PM, Omkar Joshi <ojoshi@hortonworks.com> w= rote:

Hi,

 

Starting with YARN there is no notion of job tracker and task trac= ker. Here is a quick summary

JobTracker :- 

1) Resource management :- Now done by Resource Manager (it does al= l scheduling work)

2) Application state management :- managing and launching new map = /reduce tasks (done by Application Master .. It is per job not one single e= ntity in the cluster for all jobs like MRv1).

TaskTracker :- replaced by Node Manager 

 

I would suggest you read the YARN blo= g post. This will answer most of your questions. Plus read this (slide 12) for how job actually gets executed.


Thanks,=

Omkar Joshi

 

On Thu, Nov 21, 2013 at 7:52 AM, ricky l <rickylee0815@gmail.com> wrote= :

Hi all,

 

I have a question of how a task tracker identifies job tracker add= ress when I submit MR job through YARN. As far as I know, both job tracker = and task trackers are launched through application master and I am curious about the details about job and task t= racker launch sequence.

 

thanks. 

 


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to= which it is addressed and may contain information that is confidential, pr= ivileged and exempt from disclosure under applicable law. If the reader of = this message is not the intended recipient, you are hereby notified that any printing, copying, disseminati= on, distribution, disclosure or forwarding of this communication is strictl= y prohibited. If you have received this communication in error, please cont= act the sender immediately and delete it from your system. Thank You.

 

 

--_000_869970D71E26D7498BDAC4E1CA92226B86DC6EC0MBX021E3NJ2exch_--