Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 07B4F11240 for ; Mon, 11 Aug 2014 10:40:25 +0000 (UTC) Received: (qmail 11534 invoked by uid 500); 11 Aug 2014 10:40:18 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 11402 invoked by uid 500); 11 Aug 2014 10:40:17 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 11387 invoked by uid 99); 11 Aug 2014 10:40:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Aug 2014 10:40:17 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of nmaillard@hortonworks.com designates 209.85.220.179 as permitted sender) Received: from [209.85.220.179] (HELO mail-vc0-f179.google.com) (209.85.220.179) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Aug 2014 10:40:13 +0000 Received: by mail-vc0-f179.google.com with SMTP id hq11so11113818vcb.24 for ; Mon, 11 Aug 2014 03:39:52 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=GZHoCQdYU5i4gO0GRaeD9sK5Z2+iEI447S/6f4w132w=; b=bIv4JiTaGCfSERZ4ADMgZfnTdlwYvHcfyswkawvJydsPer7L7BJhCKuQKEdvW588kT gxk3FlKJLLoVUPz2I23hkXMbqB/td3v3GGh7hHzuqU/yV7/s1amkbvFxvlMMHXTvEP12 irqt4r6ZE15ycuczUyfyHu1OrGkJZvS95yqjjoolu3F9UfEr8QWDA1xvwf37/7I9hL8c CHr+Ph8rkPodCs1KTWQKYBKqIORjtcp6depDYA3qmm+RHeopHx9EYg/U5m54Ly4qD2EX axqad7Si3+UebvpXywh0gfFvL1NPtO5+TCvQWpBCohMbcpR01cEa51tO59HigWiHN4Qk Cnrg== X-Gm-Message-State: ALoCoQnX0qXXjdXd6I/JQAR6tOT8Ct0miz6TtLlLKSg4BqAkpx5XUtQWjua+AFMpf4CybfLs/UTXErMXx1M+rOER8wUGqfABcBkmWyzLqZpD4JI0lDzbIRA= MIME-Version: 1.0 X-Received: by 10.52.35.81 with SMTP id f17mr18420560vdj.13.1407753592233; Mon, 11 Aug 2014 03:39:52 -0700 (PDT) Received: by 10.220.153.210 with HTTP; Mon, 11 Aug 2014 03:39:52 -0700 (PDT) In-Reply-To: References: Date: Mon, 11 Aug 2014 12:39:52 +0200 Message-ID: Subject: Re: Yarn, MRv1, MRv2 lots of newbie doubts and questions From: Nicolas Maillard To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=20cf307ac789cbfc9a0500582ca8 X-Virus-Checked: Checked by ClamAV on apache.org --20cf307ac789cbfc9a0500582ca8 Content-Type: text/plain; charset=UTF-8 Hello As the hadoop ecosystem moves fast and the yarn part was a mini revolution I understand your confusion. To make it simple in hadoop 1 there were two main things Hadoop MapReduce and Hadoop HDFS. Hadoop MR was actually two things: A compute paradigme, map-reduce and a distribution process of that paradigme. So MR had to do map and reduce phases but also talk to all the machines to get compute slots at the right places. This meant that use that distribution process you had to go through the mapreduce paradigme, since they were bundeled. In hadoop 2 you have map reduce 2 that is a paradigme and yarn that does the distribution. The added bonus here is now you can use the paradigme you want and talk to yarn to get the distribution. So you can still do Map Reduce code if you want but you can now do other stuff like tez,spark,giraph etc... and they all use yarn as a way to get distributed cleanly on the cluster. On the Api question yarn has also changed the game you now want to use the paradigme or engine of your choice according to what best fits your calculations, DAG or not, In memory or not, Graph or nt etc... I would advise going through higher level APIs that let you write your logic and then choose the engine you need, so Cascading for example is a nice for that. Hive As well let's you write sql code and then decide later what you need, Map reduce, tez, in the near future spark. etc.. I hope this helps On Sun, Aug 10, 2014 at 7:23 PM, Sebastiano Di Paola < sebastiano.dipaola@gmail.com> wrote: > Hi all, > I'm a newbie hadoop user, and I started using hadoop 2.4.1 as my first > installation. > So now I'm struggling with mapred, mapreduce, yarn....MRv1, MRv2, yarn. > I tried to read the documentation, but I couldn't find a clear > answer...sometimes it seems that documentations thinks that you know all > the history about hadoop framework... :( > > I started with standalone node of course, but I have deployed also a > cluster with 10 machines. > > Start with the example on the documentation. > > Cluster installed...dfs running with > start-dfs.sh > > when I run > > bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar grep input output 'dfs[a-z.]+' > > What I'm using? MRv1, MRv2? > The job execute successfully and I can get the output on HDFS output > directory. > > > Then on the same installation I start yarn with start-yarn.sh > I run the same command after starting yarn > > bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar grep input output 'dfs[a-z.]+' > > So what I'm using in this case? > > I'm not sure about what is the difference from mapreduce and > yarn....probably mapreduce is running on top of yarn? How does mapreduce > interact with yarn? it it completely transparent? > > What's the difference between a mapreduce and a yarn application? (Forgive > me if it's not correct to talk about mapreduce application) > > Besides that...writing a completely new mapreduce application what API > that should be used? not to write deprecated/old hadoop style code? > mapred or mapreduce > Thanks a lot. > Kind regards. > Seba > > > -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. --20cf307ac789cbfc9a0500582ca8 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

Hello

As the hadoop ecos= ystem moves fast and the yarn part was a mini revolution I understand your = confusion.
To make it simple in hadoop 1 there were two main thin= gs Hadoop MapReduce and Hadoop HDFS.
Hadoop MR was actually two things: A compute paradigme, map-reduce and= a distribution process of that paradigme. So MR had to do map and reduce p= hases but also talk to all the machines to get compute slots at the right p= laces. This meant that use that distribution process you had to go through = the mapreduce paradigme, since they were bundeled.

In hadoop 2 you have map reduce 2 that is a paradigme a= nd yarn that does the distribution. The added bonus here is now you can use= the paradigme you want and talk to yarn to get the distribution. So you ca= n still do Map Reduce code if you want but you can now do other stuff like = tez,spark,giraph etc... and they all use yarn as a way to get distributed c= leanly on the cluster.

On the Api question yarn has also changed the game you = now want to use the paradigme or engine of your choice according to what be= st fits your calculations, DAG or not, In memory or not, Graph or nt etc...=
I would advise going through higher level APIs that let you write your= logic and then choose the engine you need, so Cascading for example is a n= ice for that. Hive As well let's you write sql code and then decide lat= er what you need, Map reduce, tez, in the near future spark. etc..

I hope this helps

<= br>
On Sun, Aug 10, 2014 at 7:23 PM, Sebastiano D= i Paola <sebastiano.dipaola@gmail.com> wrote:
Hi all,
<= /div>I'm a newbie hadoop user, and I started using hadoop 2.4.1 as my f= irst installation.
So now I'm struggling with mapred, mapreduce, yarn....MRv1, MRv2,= yarn.
I tried to read the documentation, but I couldn't find a cle= ar answer...sometimes it seems=C2=A0 that documentations thinks that you kn= ow all the history about hadoop framework... :(

I started= with standalone node of course, but I have deployed also a cluster with 10= machines.

Start with the example on the documentation.
<= br>
Cluster installed...dfs running with
start-dfs.sh

=
when I run
bin/hadoop jar share/hadoop/mapreduce/hadoop=
-mapreduce-examples-2.4.1.jar grep input output 'dfs[a-z.]+'
What I'm using? MRv1, MRv2?
The job execute successfully and I can = get the output on HDFS output directory.


Then on the = same installation I start yarn with start-yarn.sh
I run the s= ame command after starting yarn
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.=
jar grep input output 'dfs[a-z.]+'
So what I'm = using in this case?

I'm not sure about what is the di= fference from mapreduce and yarn....probably mapreduce is running on top of= yarn? How does mapreduce interact with yarn? it it completely transparent?=

What's the difference between a mapreduce and a yarn app= lication? (Forgive me if it's not correct to talk about mapreduce appli= cation)

Besides that...writing a completely new mapreduce= application what API that should be used? not to write deprecated/old hado= op style code?
mapred or mapreduce
Thanks a lot.
Kind regards.=
Seba




CONFIDENTIALITY NOTICE
NOTICE: This message is = intended for the use of the individual or entity to which it is addressed a= nd may contain information that is confidential, privileged and exempt from= disclosure under applicable law. If the reader of this message is not the = intended recipient, you are hereby notified that any printing, copying, dis= semination, distribution, disclosure or forwarding of this communication is= strictly prohibited. If you have received this communication in error, ple= ase contact the sender immediately and delete it from your system. Thank Yo= u. --20cf307ac789cbfc9a0500582ca8--