Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D074918454 for ; Wed, 5 Aug 2015 17:33:40 +0000 (UTC) Received: (qmail 15883 invoked by uid 500); 5 Aug 2015 17:33:00 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 15773 invoked by uid 500); 5 Aug 2015 17:33:00 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 15739 invoked by uid 99); 5 Aug 2015 17:33:00 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Aug 2015 17:33:00 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 212E11A9858 for ; Wed, 5 Aug 2015 17:33:00 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.9 X-Spam-Level: ** X-Spam-Status: No, score=2.9 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id C9mWKPZ7SpNT for ; Wed, 5 Aug 2015 17:32:48 +0000 (UTC) Received: from mail-yk0-f171.google.com (mail-yk0-f171.google.com [209.85.160.171]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id A65A420594 for ; Wed, 5 Aug 2015 17:32:47 +0000 (UTC) Received: by ykoo205 with SMTP id o205so41478606yko.0 for ; Wed, 05 Aug 2015 10:32:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=CwA7ZBXd9tFEc+kpC7wA3Hpk1QOMzzMuqM8JMfBR4FQ=; b=QjKcJj3QdlNiYID0IdNb7s5XFfOfrGeX0CBJIWZ4lpkzaZmt2wF+5JzcliB0cGYzts bOjtKlaxAQEqM5Cu4gy5SVuXs5lRIiwQuiobRZp4J63uwKRhUpXPtI7WfPfhL+eNZpfw Fv+lvM7nZH6PXBCS0IDYr/LFmi7MG0o3rPcZ7Ed+rkBDq9PgY1dn0EZWHCUVAxBPd026 inQfOKnTt2MHuEk/9/NvNNyqjvyuIw4GoIJBO2EKRwPHLLO7wwMdLyYSCLiz9QzNhs4w 52Z2o4rtBQq/7Nd2ccCJiXlCh/85iEq5SP2d4pQM7TDDE+1KINFlVpx3FIipSshn5DaH i11w== MIME-Version: 1.0 X-Received: by 10.13.231.133 with SMTP id q127mr10095428ywe.66.1438795966886; Wed, 05 Aug 2015 10:32:46 -0700 (PDT) Received: by 10.129.80.193 with HTTP; Wed, 5 Aug 2015 10:32:46 -0700 (PDT) Date: Wed, 5 Aug 2015 23:02:46 +0530 Message-ID: Subject: hadoop installation in pseudo distributed mode regular user vs dedicated user From: Arvind Sundararajan To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=94eb2c0763a682bb6d051c93ca69 --94eb2c0763a682bb6d051c93ca69 Content-Type: text/plain; charset=UTF-8 Hi All, I have a laptop running Ubuntu 14.04 LTS and am trying to install hadoop 2.7.1 (current stable version) in pseudo-distributed mode. I have a regular user account on my laptop, but am confused if i should install hadoop using a dedicated hadoop user on my laptop. NOTE: By 'regular user', i mean the linux user account that i use for day-to-day personal work The current hadoop documentation at [1] does not mention setting up a dedicated user for hadoop installation. However, the hadoop installation tutorial at [2] mentions setting up a dedicated user for hadoop installation in pseudo-distributed mode on a single machine. This tutorial references an outdated hadoop installation tutorial [3] which too mentions setting up a dedicated user for hadoop installation in pseudo-distributed mode on a single machine. I found several tutorials online which all seem to mention setting up dedicated user for hadoop installation in pseudo-distributed mode on a single machine, without mentioning why we should set up a dedicated user. My questions are as follows: a) Is it possible for me to execute hadoop programs as a regular user even if hadoop is installed in pseudo-distributed mode via a dedicated 'hadoop' user? If yes, what linux filesystem folder permissions and HDFS permissions do i need to give to the regular user for executing hadoop programs? b) Quoting from the outdated hadoop installation tutorial [3]: "We will use a dedicated Hadoop user account for running Hadoop. While that's not required it is recommended because it helps to separate the Hadoop installation from other software applications and user accounts running on the same machine (think: security, permissions, backups, etc)." Can someone elaborate on this? what are the issues regarding security, permissions, backups when running hadoop in pseudo-distributed mode on a single laptop which will most likely have only one user account (my current user account) ? c) Can someone please elaborate on the pros and cons of running hadoop in pseudo-distributed mode on a single machine as the regular user versus creating a dedicated user? My thoughts on the cons, thus far has been: i) if hadoop is unable to execute from a 'regular user' and only works from the dedicated hadoop user account, then i will have to edit my hadoop java programs from my 'regular user' account where i have my development environment and IDE/text editor setup, copy the .jar files to the dedicated hadoop user account and execute. if any error occurs, i have to go back to the 'regular user' account, edit and then copy the new .jar files and execute again. this moving back and forth between accounts is a definite pain while working in pseudo-distributed mode and i have experienced this while working in Hadoop 1.x version ii) if hadoop is unable to execute from a 'regular user' and only works from the dedicated hadoop user account, then the hadoop operations copyFromLocal and copyToLocal will require a shared folder for both user accounts. P.S. I also referred [4] and [5] before asking this question. References: [1] http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/SingleCluster.html [2] http://dogdogfish.com/big-data/installing-hadoop-2-4-on-ubuntu-14-04/ [3] http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/ [4] http://stackoverflow.com/questions/20192140/hadoop-pseudo-distributed-mode-for-multiple-users [5] http://stackoverflow.com/questions/23807486/hadoop-development-dedicated-user-in-ubuntu-how-to-access-hadoop-node-running --94eb2c0763a682bb6d051c93ca69 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

Hi All,

I have a laptop running Ubuntu 14.04 LTS and am trying to = install=20 hadoop 2.7.1 (current stable version) in pseudo-distributed mode.

I have a regular user account on my laptop, but am confused if i=20 should install hadoop using a dedicated hadoop user on my laptop.
NOTE: By 'regular user', i mean the linux user account that i use f= or day-to-day personal work

The current hadoop documentation at [1] does not mention setting up a de= dicated user for hadoop installation.

However, the hadoop installation tutorial at [2] mentions setting up a dedicated user for hadoop installation in pseudo-distributed mode on a=20 single machine. This tutorial references an outdated hadoop installation tutorial [3] which too mentions setting up a dedicated user for hadoop=20 installation in pseudo-distributed mode on a single machine.

I found several tutorials online which all seem to mention setting up dedicated user for hadoop installation in pseudo-distributed mode on a=20 single machine, without mentioning why we should set up a dedicated=20 user.

My questions are as follows:

a)=20 Is it possible for me to execute hadoop programs as a regular user=20 even if hadoop is installed in pseudo-distributed mode via a dedicated=20 'hadoop' user?
If yes, what linux filesystem folder permissions and HDFS permissions do i need to give to the regular user for executing hadoop programs?

b)=20 Quoting from the outdated hadoop installation tutorial [3]:

    "We will use a dedicated Hadoop user account for runnin=
g Hadoop.
     While that's not required it is recommended because it helps to se=
parate=20
     the Hadoop installation from other software applications and
     user accounts running on the same machine=20
     (think: security, permissions, backups, etc)."

Can someone elaborate on this? what are the issues regarding=20 security, permissions, backups when running hadoop in pseudo-distributed mode on a single laptop which will most likely have only one user=20 account (my current user account) ?

c)=20 Can someone please elaborate on the pros and cons of running hadoop=20 in pseudo-distributed mode on a single machine as the regular user=20 versus creating a dedicated user?

My thoughts on the cons, thus far has been:

    i) if hado=
op is unable to execute from a 'regular user' and=20
    only works from the dedicated hadoop user account, then i=20
    will have to edit my hadoop java programs from my=20
    'regular user' account where i have my development environment
    and IDE/text editor setup, copy the .jar files to the=20
    dedicated hadoop user account and execute. if any error occurs,=20
    i have to go back to the 'regular user' account, edit and=20
    then copy the new .jar files and execute again. this moving=20
    back and forth between accounts is a definite pain while
    working in pseudo-distributed mode and i have experienced=20
    this while working in Hadoop 1.x version

    ii) if hadoop is unable to execute from a 'regular user' and=20
    only works from the dedicated hadoop user account, then=20
    the hadoop operations copyFromLocal and copyToLocal will=20
    require a shared folder for both user accounts.

P.S. I also referred [4] and [5] before asking this question.

References:

[1] http://hadoop.apache.org/docs/r2.7.1/hado= op-project-dist/hadoop-common/SingleCluster.html
[2] http://dogdogfish.com/big-data/installing-hadoo= p-2-4-on-ubuntu-14-04/
[3] http://www.michael-noll.com= /tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
[4] http://stackoverflow.com/questions/20192= 140/hadoop-pseudo-distributed-mode-for-multiple-users
[5] http://stack= overflow.com/questions/23807486/hadoop-development-dedicated-user-in-ubuntu= -how-to-access-hadoop-node-running

--94eb2c0763a682bb6d051c93ca69--