hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arvind Sundararajan <arvindappu...@gmail.com>
Subject hadoop installation in pseudo distributed mode regular user vs dedicated user
Date Wed, 05 Aug 2015 17:32:46 GMT
Hi All,

I have a laptop running Ubuntu 14.04 LTS and am trying to install hadoop
2.7.1 (current stable version) in pseudo-distributed mode.

I have a regular user account on my laptop, but am confused if i should
install hadoop using a dedicated hadoop user on my laptop.
NOTE: By 'regular user', i mean the linux user account that i use for
day-to-day personal work

The current hadoop documentation at [1] does not mention setting up a
dedicated user for hadoop installation.

However, the hadoop installation tutorial at [2] mentions setting up a
dedicated user for hadoop installation in pseudo-distributed mode on a
single machine. This tutorial references an outdated hadoop installation
tutorial [3] which too mentions setting up a dedicated user for hadoop
installation in pseudo-distributed mode on a single machine.

I found several tutorials online which all seem to mention setting up
dedicated user for hadoop installation in pseudo-distributed mode on a
single machine, without mentioning why we should set up a dedicated user.

My questions are as follows:

a) Is it possible for me to execute hadoop programs as a regular user even
if hadoop is installed in pseudo-distributed mode via a dedicated 'hadoop'
user?
If yes, what linux filesystem folder permissions and HDFS permissions do i
need to give to the regular user for executing hadoop programs?

b) Quoting from the outdated hadoop installation tutorial [3]:

    "We will use a dedicated Hadoop user account for running Hadoop.
     While that's not required it is recommended because it helps to separate
     the Hadoop installation from other software applications and
     user accounts running on the same machine
     (think: security, permissions, backups, etc)."

Can someone elaborate on this? what are the issues regarding security,
permissions, backups when running hadoop in pseudo-distributed mode on a
single laptop which will most likely have only one user account (my current
user account) ?

c) Can someone please elaborate on the pros and cons of running hadoop in
pseudo-distributed mode on a single machine as the regular user versus
creating a dedicated user?

My thoughts on the cons, thus far has been:

    i) if hadoop is unable to execute from a 'regular user' and
    only works from the dedicated hadoop user account, then i
    will have to edit my hadoop java programs from my
    'regular user' account where i have my development environment
    and IDE/text editor setup, copy the .jar files to the
    dedicated hadoop user account and execute. if any error occurs,
    i have to go back to the 'regular user' account, edit and
    then copy the new .jar files and execute again. this moving
    back and forth between accounts is a definite pain while
    working in pseudo-distributed mode and i have experienced
    this while working in Hadoop 1.x version

    ii) if hadoop is unable to execute from a 'regular user' and
    only works from the dedicated hadoop user account, then
    the hadoop operations copyFromLocal and copyToLocal will
    require a shared folder for both user accounts.

P.S. I also referred [4] and [5] before asking this question.

References:

[1]
http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/SingleCluster.html
[2] http://dogdogfish.com/big-data/installing-hadoop-2-4-on-ubuntu-14-04/
[3]
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
[4]
http://stackoverflow.com/questions/20192140/hadoop-pseudo-distributed-mode-for-multiple-users
[5]
http://stackoverflow.com/questions/23807486/hadoop-development-dedicated-user-in-ubuntu-how-to-access-hadoop-node-running

Mime
View raw message