hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Kimball <aa...@cloudera.com>
Subject Re: Preferred way to submit a job?
Date Thu, 12 Aug 2010 00:08:50 GMT
On Wed, Aug 11, 2010 at 3:13 PM, David Rosenstrauch <darose@darose.net>wrote:

> What's the preferred way to submit a job these days?
> org.apache.hadoop.mapreduce.Job.submit() ?  Or
> org.apache.hadoop.mapred.JobClient.runJob()?  Or does it even matter? (i.e.,
> is there any difference between them?)
If you're using the old API (e.g., you're filling out o.a.h.mapred.JobConf,
and implementing o.a.h.mapred.Mapper) then you use JobClient.runJob(). If
you're using the new API (o.a.h.mapreduce.Job, o.a.h.mapreduce.Mapper), then
you use Job.waitForCompletion().

You can't mix'n'match; your job has to be entirely "old style" or entirely
"new style." Some programs use one, some use the other.

> I've been trying to run a job using
> org.apache.hadoop.mapreduce.Job.submit() (since I assumed that the
> org.apache.hadoop.mapred.* classes were deprecated).  However, I'm seeing
> some weirdness (the "mapred.job.tracker" setting that I set on my job's
> Configuration is getting ignored, and making the job get run locally) and I
> was wondering if the way I was submitting my job might have something to do
> with it.
> On a related note, if there's actually no difference between the 2 methods,
> would anybody have any idea what could make the "mapred.job.tracker" setting
> on a job Configuration get ignored?  (I currently have it set to
> "hdfs://<hadoop_job_tracker_host_name>:9001".)
There's a reason that's being ignored :) That is not a jobtracker address.
Assuming you've configured your namenode and your jobtracker on the same
machine, then your fs.default.name should be hdfs://hdfs.host.name:port, and
mapred.job.tracker should just be jt.host.name:port

The port numbers in these two cases will be different.

- Aaron

> TIA,
> DR

View raw message