hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Lilley <john.lil...@redpoint.net>
Subject RE: In YARN, how does a task tracker knows the address of a job tracker?
Date Sat, 23 Nov 2013 19:16:40 GMT
Ricky,

What you are doing sounds familiar.  We are in the process of implementing, not exactly MapReduce,
but a system that has to do many of the things that MapReduce does (find data splits, define
tasks, choose execution affinity, launch an app master, etc)

There is another special thing that MapReduce under YARN does that a normal YARN app cannot
easily access, which are "auxiliary services".  MapReduce sets up a YARN auxiliary service
to serve up the results of mapper outputs.  I think it is based on netty or jetty and HTTP.
 The point is, that the MR aux service is part of the Hadoop distro, so all MR has to do is
tell the NM to run it.  Regular YARN apps don't have this luxury without installing jars on
each node and adding them to the hadoop stack's CLASSPATH.  There doesn't appear to be any
standard or documented way to inject extra jars into the hadoop install.  As they say, that
exercise is left to the reader.

john

From: ricky l [mailto:rickylee0815@gmail.com]
Sent: Thursday, November 21, 2013 3:40 PM
To: user@hadoop.apache.org
Subject: Re: In YARN, how does a task tracker knows the address of a job tracker?

Hi John, thanks for your reply. I suspect there will be some external communication between
AM and container tasks. I am trying to implement a Hadoop-like system to Yarn and I wanted
to draw a high-level steps before starting the work. thanks,


On Thu, Nov 21, 2013 at 3:27 PM, John Lilley <john.lilley@redpoint.net<mailto:john.lilley@redpoint.net>>
wrote:
MapReduce also communicates outside of what is directly supported by YARN.
In a YARN application, there is very little direct communication between the client and the
AM, and between the AM and container tasks.
I think that an AM can update to the client two pieces of information -- "state" and "percent
complete".
However, at launch time an AM can open up a protocol port and tell the client and the container
tasks how to connect back.
I don't know the details, but I believe that the MapReduce AM communicates directly with all
mapper, reducer tasks as well as the client.
John


From: ricky l [mailto:rickylee0815@gmail.com<mailto:rickylee0815@gmail.com>]
Sent: Thursday, November 21, 2013 12:36 PM
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Re: In YARN, how does a task tracker knows the address of a job tracker?

Thank you for the answer, Omkar.

I read the links that were helpful. Though the concept of job tracker/task tracker does not
exist in the YARN MapReduce, doesn't it use the binary of job/task tracker? I though the application
master runs job tracker binary and the containers in the node will run task tracker binary.
thx

On Thu, Nov 21, 2013 at 2:06 PM, Omkar Joshi <ojoshi@hortonworks.com<mailto:ojoshi@hortonworks.com>>
wrote:
Hi,

Starting with YARN there is no notion of job tracker and task tracker. Here is a quick summary
JobTracker :-
1) Resource management :- Now done by Resource Manager (it does all scheduling work)
2) Application state management :- managing and launching new map /reduce tasks (done by Application
Master .. It is per job not one single entity in the cluster for all jobs like MRv1).
TaskTracker :- replaced by Node Manager

I would suggest you read the YARN blog post<http://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/>.
This will answer most of your questions. Plus read this<http://www.slideshare.net/ovjforu/yarn-way-to-share-cluster-beyond>
(slide 12) for how job actually gets executed.

Thanks,
Omkar Joshi
Hortonworks Inc.<http://www.hortonworks.com>

On Thu, Nov 21, 2013 at 7:52 AM, ricky l <rickylee0815@gmail.com<mailto:rickylee0815@gmail.com>>
wrote:
Hi all,

I have a question of how a task tracker identifies job tracker address when I submit MR job
through YARN. As far as I know, both job tracker and task trackers are launched through application
master and I am curious about the details about job and task tracker launch sequence.

thanks.


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed
and may contain information that is confidential, privileged and exempt from disclosure under
applicable law. If the reader of this message is not the intended recipient, you are hereby
notified that any printing, copying, dissemination, distribution, disclosure or forwarding
of this communication is strictly prohibited. If you have received this communication in error,
please contact the sender immediately and delete it from your system. Thank You.



Mime
View raw message