hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Nauroth <cnaur...@hortonworks.com>
Subject Re: How to run hadoop jar command in a clustered environment
Date Mon, 15 Apr 2013 17:32:37 GMT
Hello Thoihen,

I'm moving this discussion from common-dev (questions about developing
Hadoop) to user (questions about using Hadoop).

If you haven't already seen it, then I recommend reading the cluster setup
documentation.  It's a bit different depending on the version of the Hadoop
code that you're deploying and running.  You mentioned JobTracker, so I
expect that you're using something from the 1.x line, but here are links to
both 1.x and 2.x docs just in case:

1.x: http://hadoop.apache.org/docs/r1.1.2/cluster_setup.html

To address your specific questions:

1. You can run the hadoop jar command and submit MapReduce jobs from any
machine that has the Hadoop software and configuration deployed and has
network connectivity to the machines that make up the Hadoop cluster.

2. Yes, you can use a separate machine that is not a member of the cluster
(meaning it does not run Hadoop daemons like DataNode, TaskTracker, or
NodeManager).  This is your choice.  I've found it valuable to isolate
nodes like this to prevent MR job tasks from taking processing resources
away from interactive user commands, but this does mean that the resources
on that node can't be utilized by MR jobs during user idle times, so it
causes a small hit to overall utilization.

Hope this helps,

On Mon, Apr 15, 2013 at 9:36 AM, Thoihen Maibam <thoihen123@gmail.com>wrote:

> Hi All,
> I am really new to Hadoop and installed hadoop in my local ubuntu machine.
> I also created a wordcount.jar and started hadoop with start-all.sh which
> started all the hadoop daemons and used jps to confirm it. Cd to hadoop/bin
> and ran hadoop jar x.jar  and successfully ran the map reduce program.
> Now, can someone please help me how I should run the hadoop jar command
> over a clustered environment say for example a cluster with 50 nodes. I
> know a dedicated machine would be namenode and another jobtracker and other
> datanodes and tasktrackers.
> 1. From which machine should I run the hadoop jar command considering I
> have a mapreduce jar in hand. Is it the jobtracker machine from where I
> should run this hadoop jar command or can I run this hadoop jar command
> from any machine in the cluster.
> 2, Can I run the map reduce job from another machine which is not part of
> the cluster , if yes how should I do it.
> Please help me.
> Regards
> thoihen

View raw message