Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of nmaillard@hortonworks.com
 designates 209.85.220.179 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CACLTFx6yggyHYzghsoMkf_75u8PrDrMevW35FVa8L0TJgu6JZg@mail.gmail.com>
References: 
 <CACLTFx6yggyHYzghsoMkf_75u8PrDrMevW35FVa8L0TJgu6JZg@mail.gmail.com>
Date: Mon, 11 Aug 2014 12:39:52 +0200
Message-ID: 
 <CAJTKPK3z68EmnzokxHx_SpX4TxFyzeTLmqJj9YCWHBHiOUmyow@mail.gmail.com>
Subject: Re: Yarn, MRv1, MRv2 lots of newbie doubts and questions
From: Nicolas Maillard <nmaillard@hortonworks.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=20cf307ac789cbfc9a0500582ca8

--20cf307ac789cbfc9a0500582ca8
Content-Type: text/plain; charset=UTF-8

Hello

As the hadoop ecosystem moves fast and the yarn part was a mini revolution
I understand your confusion.
To make it simple in hadoop 1 there were two main things Hadoop MapReduce
and Hadoop HDFS.
Hadoop MR was actually two things: A compute paradigme, map-reduce and a
distribution process of that paradigme. So MR had to do map and reduce
phases but also talk to all the machines to get compute slots at the right
places. This meant that use that distribution process you had to go through
the mapreduce paradigme, since they were bundeled.

In hadoop 2 you have map reduce 2 that is a paradigme and yarn that does
the distribution. The added bonus here is now you can use the paradigme you
want and talk to yarn to get the distribution. So you can still do Map
Reduce code if you want but you can now do other stuff like
tez,spark,giraph etc... and they all use yarn as a way to get distributed
cleanly on the cluster.

On the Api question yarn has also changed the game you now want to use the
paradigme or engine of your choice according to what best fits your
calculations, DAG or not, In memory or not, Graph or nt etc...
I would advise going through higher level APIs that let you write your
logic and then choose the engine you need, so Cascading for example is a
nice for that. Hive As well let's you write sql code and then decide later
what you need, Map reduce, tez, in the near future spark. etc..

I hope this helps


On Sun, Aug 10, 2014 at 7:23 PM, Sebastiano Di Paola <
sebastiano.dipaola@gmail.com> wrote:

> Hi all,
> I'm a newbie hadoop user, and I started using hadoop 2.4.1 as my first
> installation.
> So now I'm struggling with mapred, mapreduce, yarn....MRv1, MRv2, yarn.
> I tried to read the documentation, but I couldn't find a clear
> answer...sometimes it seems  that documentations thinks that you know all
> the history about hadoop framework... :(
>
> I started with standalone node of course, but I have deployed also a
> cluster with 10 machines.
>
> Start with the example on the documentation.
>
> Cluster installed...dfs running with
> start-dfs.sh
>
> when I run
>
> bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar grep input output 'dfs[a-z.]+'
>
> What I'm using? MRv1, MRv2?
> The job execute successfully and I can get the output on HDFS output
> directory.
>
>
> Then on the same installation I start yarn with start-yarn.sh
> I run the same command after starting yarn
>
> bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar grep input output 'dfs[a-z.]+'
>
> So what I'm using in this case?
>
> I'm not sure about what is the difference from mapreduce and
> yarn....probably mapreduce is running on top of yarn? How does mapreduce
> interact with yarn? it it completely transparent?
>
> What's the difference between a mapreduce and a yarn application? (Forgive
> me if it's not correct to talk about mapreduce application)
>
> Besides that...writing a completely new mapreduce application what API
> that should be used? not to write deprecated/old hadoop style code?
> mapred or mapreduce
> Thanks a lot.
> Kind regards.
> Seba
>
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

--20cf307ac789cbfc9a0500582ca8
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><br></div>Hello<div><br></div><div>As the hadoop ecos=
ystem moves fast and the yarn part was a mini revolution I understand your =
confusion.</div><div>To make it simple in hadoop 1 there were two main thin=
gs Hadoop MapReduce and Hadoop HDFS.</div>
<div>Hadoop MR was actually two things: A compute paradigme, map-reduce and=
 a distribution process of that paradigme. So MR had to do map and reduce p=
hases but also talk to all the machines to get compute slots at the right p=
laces. This meant that use that distribution process you had to go through =
the mapreduce paradigme, since they were bundeled.</div>
<div><br></div><div>In hadoop 2 you have map reduce 2 that is a paradigme a=
nd yarn that does the distribution. The added bonus here is now you can use=
 the paradigme you want and talk to yarn to get the distribution. So you ca=
n still do Map Reduce code if you want but you can now do other stuff like =
tez,spark,giraph etc... and they all use yarn as a way to get distributed c=
leanly on the cluster.</div>
<div><br></div><div>On the Api question yarn has also changed the game you =
now want to use the paradigme or engine of your choice according to what be=
st fits your calculations, DAG or not, In memory or not, Graph or nt etc...=
</div>
<div>I would advise going through higher level APIs that let you write your=
 logic and then choose the engine you need, so Cascading for example is a n=
ice for that. Hive As well let&#39;s you write sql code and then decide lat=
er what you need, Map reduce, tez, in the near future spark. etc..</div>
<div><br></div><div>I hope this helps</div><div class=3D"gmail_extra"><br><=
br><div class=3D"gmail_quote">On Sun, Aug 10, 2014 at 7:23 PM, Sebastiano D=
i Paola <span dir=3D"ltr">&lt;<a href=3D"mailto:sebastiano.dipaola@gmail.co=
m" target=3D"_blank">sebastiano.dipaola@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div><div><div>Hi all,<br><=
/div>I&#39;m a newbie hadoop user, and I started using hadoop 2.4.1 as my f=
irst installation.<br>
</div>So now I&#39;m struggling with mapred, mapreduce, yarn....MRv1, MRv2,=
 yarn.<br>
</div><div>I tried to read the documentation, but I couldn&#39;t find a cle=
ar answer...sometimes it seems=C2=A0 that documentations thinks that you kn=
ow all the history about hadoop framework... :(<br><br></div><div>I started=
 with standalone node of course, but I have deployed also a cluster with 10=
 machines.<br>

</div><div><br></div><div>Start with the example on the documentation.<br><=
br></div><div>Cluster installed...dfs running with <br>start-dfs.sh<br><br>=
</div><div>when I run <br><pre>bin/hadoop jar share/hadoop/mapreduce/hadoop=
-mapreduce-examples-2.4.1.jar grep input output &#39;dfs[a-z.]+&#39;</pre>

What I&#39;m using? MRv1, MRv2? <br>The job execute successfully and I can =
get the output on HDFS output directory.<br><br><br></div><div>Then on the =
same installation I start yarn with start-yarn.sh<br></div><div>I run the s=
ame command after starting yarn<br>

<pre>bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.=
jar grep input output &#39;dfs[a-z.]+&#39;</pre></div><div>So what I&#39;m =
using in this case?<br><br></div><div>I&#39;m not sure about what is the di=
fference from mapreduce and yarn....probably mapreduce is running on top of=
 yarn? How does mapreduce interact with yarn? it it completely transparent?=
<br>

<br></div><div>What&#39;s the difference between a mapreduce and a yarn app=
lication? (Forgive me if it&#39;s not correct to talk about mapreduce appli=
cation)<br><br></div><div>Besides that...writing a completely new mapreduce=
 application what API that should be used? not to write deprecated/old hado=
op style code?<br>

mapred or mapreduce<br></div><div>Thanks a lot.<br></div><div>Kind regards.=
<br>Seba<br></div><div><br></div><div><br></div></div>
</blockquote></div><br></div></div>

<br>
<span style=3D"color:rgb(128,128,128);font-family:Arial,sans-serif;font-siz=
e:10px">CONFIDENTIALITY NOTICE</span><br style=3D"color:rgb(128,128,128);fo=
nt-family:Arial,sans-serif;font-size:10px"><span style=3D"color:rgb(128,128=
,128);font-family:Arial,sans-serif;font-size:10px">NOTICE: This message is =
intended for the use of the individual or entity to which it is addressed a=
nd may contain information that is confidential, privileged and exempt from=
 disclosure under applicable law. If the reader of this message is not the =
intended recipient, you are hereby notified that any printing, copying, dis=
semination, distribution, disclosure or forwarding of this communication is=
 strictly prohibited. If you have received this communication in error, ple=
ase contact the sender immediately and delete it from your system. Thank Yo=
u.</span>
--20cf307ac789cbfc9a0500582ca8--