Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A82AD200ACA for ; Thu, 9 Jun 2016 11:15:41 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id A6AE4160A58; Thu, 9 Jun 2016 09:15:41 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A22C8160A2B for ; Thu, 9 Jun 2016 11:15:40 +0200 (CEST) Received: (qmail 32339 invoked by uid 500); 9 Jun 2016 09:15:38 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 32329 invoked by uid 99); 9 Jun 2016 09:15:38 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Jun 2016 09:15:38 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id A46F618052A for ; Thu, 9 Jun 2016 09:15:37 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.574 X-Spam-Level: * X-Spam-Status: No, score=1.574 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=2, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_NONE=-0.0001, RP_MATCHES_RCVD=-1.426] autolearn=disabled Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id ugBDCNe4lzgA for ; Thu, 9 Jun 2016 09:15:34 +0000 (UTC) Received: from mailgw1.de3-is.proasp.de (mail.proheris.de [195.145.99.124]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id 635125F19B for ; Thu, 9 Jun 2016 09:15:34 +0000 (UTC) Received: from fw.lfm.proheris.de ([192.3.1.191]:35787 helo=lfm.fh-swf.de) by mailgw1.de3-is.proasp.de with esmtps (TLSv1.2:DHE-RSA-AES256-GCM-SHA384:256) (Exim 4.82_1-5b7a7c0-XX) (envelope-from ) id 1bAw3m-0005Q1-2m for user@hadoop.apache.org; Thu, 09 Jun 2016 11:15:30 +0200 Received: from lfm15.lfm.proheris.de ([192.3.1.18]:19869) by lfm.fh-swf.de with esmtps (TLSv1:AES256-SHA:256) (Exim 4.82_1-5b7a7c0-XX) (envelope-from ) id 1bAw3k-0000tA-1D for user@hadoop.apache.org; Thu, 09 Jun 2016 11:15:28 +0200 Received: from LFM15.lfm.proheris.de ([fe80::507:a70d:aa80:20c1]) by LFM15.lfm.proheris.de ([fe80::507:a70d:aa80:20c1%10]) with mapi id 14.03.0279.002; Thu, 9 Jun 2016 11:15:28 +0200 From: Mike Wenzel To: "user@hadoop.apache.org" Subject: Looking for documentation/guides on Hadoop 2.7.2 Thread-Topic: Looking for documentation/guides on Hadoop 2.7.2 Thread-Index: AdHCKRtDBCCdQUozR4GjCvFIJxQssAABkYjw Date: Thu, 9 Jun 2016 09:15:27 +0000 Message-ID: Accept-Language: de-DE, en-US Content-Language: de-DE X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [192.3.1.209] Content-Type: multipart/alternative; boundary="_000_DFACCAB6686FFB4E953F18E4EAB3BA4C7B896927LFM15lfmproheri_" MIME-Version: 1.0 archived-at: Thu, 09 Jun 2016 09:15:41 -0000 --_000_DFACCAB6686FFB4E953F18E4EAB3BA4C7B896927LFM15lfmproheri_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hey everyone. I just started some weeks ago to learn about Hadoop. I got th= e task to understand the Hadoop Ecosystem, and be able to answer some quest= ions. First of all I started reading a book "OReilly - Hadoop The Definitiv= e Guide". After reading the book I had a first idea of how components work = together, but for me the book didn't helped me to understand what's going o= n. In my opinion the book described pretty much general in depth details ab= out various components. This didn't helped me to understand the Hadoop Ecos= ystem. I started to work with it. I installed a VM (SUSE Leap 42.1) and followed t= he https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/= SingleCluster.html Guide. After doing this I started to work with files on it. I wrote my first simpl= e mapper and reducer, and I analyzed my apache log for some testing. This w= orked good so far. But let's face my problems: 1) All my knowledge about the Installing of Hadoop right now is: Unpacking = a .tar.gz. I ran some shell-scripts and everything was running fine. Well, = I have no clue at all, which components are now installed on the VM and whe= re are they located and installed? 2) Furthermore, I'm missing all kinds of information about setting those up= . The apache guide on some point says "Now check that you can ssh to the lo= calhost without a passphrase" "If you cannot ssh to localhost without a pas= sphrase, execute the following commands:". Well, I'd like to know what am I= doing here ?! I mean WHY do I need ssh running on localhost, and WHY do th= is have to be without a passphrase. Which other ways of configuring this do= exists? 3) Same on the next point: "The following instructions are to run a MapRedu= ce job locally. If you want to execute a job on YARN, see YARN on Single No= de." "Format the filesystem: $ bin/hdfs namenode -format". I have no clue h= ow HDFS internally work. For me a Filesystem is where I can setup partition= s hooked on folders. So how am I supposed to explain hdfs to someone else? I understood the storing of data, splitting files in blocks, spread files a= round the cluster, store metadata, but if someone asks me: "How can this be= called filesystem if you install it by unpacking a .tar.gz?" I simply can'= t answer this question in any way. So I'm now looking for a documentation/guide for: - Which requirements do I have? -- Does I have to use a specific Filesystem? If yes/no, why or what would y= ou recommend? -- How should I partition my VM? -- On which partition should I install which components? - Setting up a VM with Hadoop - Configure Hadoop step by step - Setup all kinds of deamons/nodes manually and explain where are they loca= ted (how they work) and how they should be configured I'm right now reading: https://hadoop.apache.org/docs/stable/hadoop-project= -dist/hadoop-common/ClusterSetup.html but after some first readings this Gu= ide will tell you what to write in which configuration-file, but now why yo= u should do this or not. I'm feeling like "leaved alone in the darkness" af= ter getting an idea of what Hadoop is. I hope some of you can show me some = ways to get back om the road. For me it's very important not just to write some configuration somewhere. = I need to understand what's going on because if I got a running cluster and= things, I need to be sure to handle all this stuff before going in product= ive use with it. Best Regards Mike --_000_DFACCAB6686FFB4E953F18E4EAB3BA4C7B896927LFM15lfmproheri_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Hey every= one. I just started some weeks ago to learn about Hadoop. I got the task to= understand the Hadoop Ecosystem, and be able to answer some questions. Fir= st of all I started reading a book "OReilly - Hadoop The Definitive Guide". After reading the book I had a first = idea of how components work together, but for me the book didn't helped me = to understand what’s going on. In my opinion the book described prett= y much general in depth details about various components. This didn't helped me to understand the Hadoop Ecosystem.

&nbs= p;

I started= to work with it. I installed a VM (SUSE Leap 42.1) and followed the https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Sin= gleCluster.html Guide.

After doi= ng this I started to work with files on it. I wrote my first simple mapper = and reducer, and I analyzed my apache log for some testing. This worked goo= d so far.

&nbs= p;

But let&#= 8217;s face my problems:

1) All my= knowledge about the Installing of Hadoop right now is: Unpacking a .tar.gz= . I ran some shell-scripts and everything was running fine. Well, I have no= clue at all, which components are now installed on the VM and where are they located and installed?

&nbs= p;

2) Furthe= rmore, I'm missing all kinds of information about setting those up. The apa= che guide on some point says "Now check that you can ssh to the localh= ost without a passphrase" "If you cannot ssh to localhost without a passphrase, execute the following commands:". = Well, I'd like to know what am I doing here ?! I mean WHY do I need ssh run= ning on localhost, and WHY do this have to be without a passphrase. Which o= ther ways of configuring this do exists?

&nbs= p;

3) Same o= n the next point: "The following instructions are to run a MapReduce j= ob locally. If you want to execute a job on YARN, see YARN on Single Node.&= quot; "Format the filesystem: $ bin/hdfs namenode -format". I have no clue how HDFS internally work. For me a Filesyste= m is where I can setup partitions hooked on folders. So how am I supposed t= o explain hdfs to someone else?

I underst= ood the storing of data, splitting files in blocks, spread files around the= cluster, store metadata, but if someone asks me: "How can this be cal= led filesystem if you install it by unpacking a .tar.gz?" I simply can't answer this question in any way.

&nbs= p;

So I'm no= w looking for a documentation/guide for:

- Which r= equirements do I have?

-- Does I= have to use a specific Filesystem? If yes/no, why or what would you recomm= end?

-- How sh= ould I partition my VM?

-- On whi= ch partition should I install which components?

- Setting= up a VM with Hadoop

- Configu= re Hadoop step by step

- Setup a= ll kinds of deamons/nodes manually and explain where are they located (how = they work) and how they should be configured

&nbs= p;

I'm right= now reading: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Clu= sterSetup.html but after some first readings this Guide will tell you w= hat to write in which configuration-file, but now why you should do this or= not. I'm feeling like "leaved alone in the darkness" after getting an idea of what Hadoop is. I hope some= of you can show me some ways to get back om the road.

For me it= 's very important not just to write some configuration somewhere. I need to= understand what's going on because if I got a running cluster and things, = I need to be sure to handle all this stuff before going in productive use with it.

&nbs= p;

Best Regards<= /span>

Mike

--_000_DFACCAB6686FFB4E953F18E4EAB3BA4C7B896927LFM15lfmproheri_--