hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tony Dean <Tony.D...@sas.com>
Subject RE: mr1 and mr2
Date Sun, 11 May 2014 19:07:15 GMT
Here is what I learned from different docs.  Please correct if wrong.

old API (mapred) and new API (mapreduce) are compatible.  You can use either one.

The old mapred API can be used to communicate with either MRv1 (JobTracker) or MRv2 (YARN).
In both cases the client uses the old deprecated property: mapred.job.tracker
Set it to the address of the JobTracker or ResourceManager or "local" if you want to run in
local mode.

The new mapreduce API should also be capable of communicating with either MRv1 (JobTracker)
or MRv2 (YARN).
A new property, mapreduce.jobtracker.address, is introduced in place of the deprecated property
above.  It specifies
how to communicate with a Job Tracker.
When new API is communicating with MRv2 (YARN) on the backend, you need to use these properties

Is the last statement correct?  I haven't actually tried this out.  What does "classic" mean?

Lastly, when run with old API (mapred) against an MRv2 (YARN) backend, I'm getting the following
2014-05-11 14:48:46,571 [tomcat-http--1] ERROR org.apache.hadoop.security.UserGroupInformation
- PriviledgedActionException as:saspad (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(java.io.IOException):
Unknown rpc kind RPC_WRITABLE

This leads me to believe some incompatibility on the client and server side.  I am using CDH4.6
jars on both the client and server.

What else am I missing?

Gaining some insight, but still a little confused.


-----Original Message-----
From: Tony Dean 
Sent: Sunday, May 11, 2014 8:20 AM
To: 'Harsh J'; cdh-user@cloudera.org
Subject: RE: mr1 and mr2

Hi Harsh,
Thanks for your reply.

The confusion comes in play between API vs. Implementation.  I'm using YARN on the server.

I'm using mapred JobClient and MRv2 (YARN) implementation on the server.  Changing client
configuration to use mapred.job.tracker and setting it the YARN resource manager host:port
did perform the correct connection this time.  When would I use mapreduce.jobtracker.address
vs. yarn.resourcemanager.address?  Sorry for the confusion.

Also, now that I'm connecting to the ResourceManager, I'm getting the following exception:
2014-05-11 07:43:41,315 [tomcat-http--1] ERROR org.apache.hadoop.security.UserGroupInformation
- PriviledgedActionException as:saspad (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(java.io.IOException):
Unknown rpc kind RPC_WRITABLE

I have simple security setup.  User saspad can write to the HDFS file system with no problem.
 I do not have any services privileges enabled.  I'm sure this is another mis-configuration,
but not not sure what.

I appreciate any guidance.


-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com]
Sent: Sunday, May 11, 2014 2:35 AM
To: cdh-user@cloudera.org; Tony Dean
Subject: Re: mr1 and mr2

The MR1 configuration is 'mapred.job.tracker', not 'mapreduce.jobtracker.address' (this is
a newer name understood only by MR in 2.x). Without the former, if you target an MR1 runtime,
the job will evaluate the default of 'mapred.job.tracker' as 'local' and run a local job.

If your confusion is after following the given page at http://archive.cloudera.com/cdh4/cdh/4/hadoop/hadoop-project-dist/hadoop-common/DeprecatedProperties.html,
then please see the note at the bottom of
In Hadoop 2.0.0 and later (MRv2), a number of Hadoop and HDFS properties have been deprecated.
(The change dates from Hadoop 0.23.1, on which the Beta releases of CDH4 were based). A list
of deprecated properties and their replacements can be found on the Apache Deprecated Properties
Note: All of these deprecated properties continue to work in MRv1.
Conversely the new mapreduce* properties listed do not work in MRv1.

On Sun, May 11, 2014 at 5:22 AM, Tony Dean <Tony.Dean@sas.com> wrote:
> Hi,
> I am trying to write a Java application that works with either MR1 and MR2.
> At the present I have MR2 (YARN) implementation deployed and running.  
> I am using mapred API.  I believe that I read mapred and mapreduce 
> APIs are compatible so either should work.  The only thing that is 
> different is the configuration properties that need to be specified 
> depending on whether the back-end is MR1 or MR2. BTW: I’m using CDH 4.6 (Hadoop 2.0).
> My problem is that I can’t seem to submit a job to the cluster.  It 
> always runs locally.  I setup JobConf with appropriate properties and 
> submit the jobs using JobClient.  The properties that I set on JobConf are as follows:
> mapreduce.jobtracker.address=host:port (I know this is for MR1, but 
> I’m trying everything)
> mapreduce.framework.name=yarn
> yarn.resourcemanager.address=host:port
> yarn.resourcemanager.host=host:port
> The last 2 are the same but I read 2 different ways to set it in 
> different conflicting documentations.
> Anyway, can someone explain how to get this seemingly simple 
> deployment to work?  What am I missing?
> Thanks!!!

Harsh J

View raw message