hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ados1984@gmail.com" <ados1...@gmail.com>
Subject Re: Reg: Setting up Hadoop Cluster
Date Thu, 13 Mar 2014 21:22:26 GMT
Thank you Geoffry,

I have some fundamental question here.

   1. Once I have installed Hadoop, how can i identify which nodes is
   master node, which is slave?
   2. My understanding is that master node is by default namenode and slave
   node are data nodes, correct?
   3. So i installed hadoop and i do not know which one is namenode and
   which one id datanode then how can i go in and start run my jar from
   4. also when we do mapreduce programming, where do we write the program
   on hadoop server (where we have nodes installed both master/namenode and
   slaves/datanode) or in our local system using any standard ide then package
   them together as jar and deploy it to name node, but here again how can i
   identify which is name node and which is data node?
   5. Ok, assumming, I have figured out which one is data node and which
   one is namenode then how will my mapreduce program or pig or hive scripts
   know that it needs to run on node 1 or node 2 or node 3?
   6. also where do we install pig, hive and flume on hadoop master/slaves
   nodes or somewhere else? and how do we let pig/hive know that node 1 is
   master/namenode and other nodes are slaves or data nodes?

I would really appreciate inputs on this questions as setting up hadoop is
turning out to be a quite complex task from where i currently see it.

Regards, Andy.

On Thu, Mar 13, 2014 at 5:14 PM, Geoffry Roberts <threadedblue@gmail.com>wrote:

> Andy,
> Once you have hadoop running,  You can run your jobs from the cli of the
> name node. When I write a map reduce job, I jar it up. and place it in,
> say, my home directory and run it from there.  I do the same with pig
> scripts.  I've used neither hive nor cascading, but I imagine they would
> work the same.
> Another approach I've tried is WebHDFS.  It's for manipulating the hdfs
> via a restful interface.  It worked well enough for me.  I stopped using it
> when I discovered it didn't support MapFiles but that's another story.
> On Thu, Mar 13, 2014 at 5:00 PM, ados1984@gmail.com <ados1984@gmail.com>wrote:
>> Hello Team,
>> I have one question regarding putting data into hdfs and running
>> mapreduce on data present in hdfs.
>>    1. hdfs is file system and so to interact with it what kind of
>>    clients are available? also where do we need to install those client?
>>    2. regarding pig, hive and mapreduce, where do we install them on
>>    hadoop cluster and from where do we run all scripts and how does it
>>    internally know that it needs to run on node 1, node2 or node 3?
>> any inputs here would really helpful.
>> Thanks, Andy.
> --
> There are ways and there are ways,
> Geoffry Roberts

View raw message