hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marko Dinic <marko.di...@nissatech.com>
Subject Running MapReduce2 (YARN) jobs remotely
Date Thu, 29 Sep 2016 10:19:40 GMT
Hello,

I'm having problem running MR jobs remotely. The reason to do this is to 
be able to run integration tests. I have a jar which I usually run on 
YARN cluster using

yarn jar ... (or hadoop jar ...)

The thing is, now I want to write integration tests and to do that I 
created a separate (maven) project in which I include jar containing my 
MR jobs. The only difference is that now I'm running the job with 
Configuration created in the following way:

Configuration conf = new Configuration();
conf.set("mapred.job.tracker", "192.168.x.x:8050");
conf.set("fs.default.name", "192.168.x.x");
conf.set("mapreduce.framework.name", "yarn");

Additionally, to be able to run the job as a different user I use UGI in 
the following way:

UserGroupInformation ugi = 
UserGroupInformation.createRemoteUser("username");

ugi.doAs(PrivilegedAction<Void>) () -> {

     myJobDriver.runJob(conf);

}

At first I did not set framework name and the strange thing was that I 
get output created on HDFS, but there is no log (I'm looking at the web 
console) by ResourceManager that my job was run on the cluster. That 
made me think that my job is actually run locally, but the data was 
retrieved from cluster and saved to it.

Now when I aded the framework name I get "java.io.IOException: Cannot 
initialize Cluster. Please check your configuration for 
mapreduce.framework.name and the correspond server addresses." exception.

What seems to be the problem and how to fix it?

Additionally, could anyone comment is this a good way to perform 
integration testing on Hadoop?

Many thanks,
Marko


Mime
View raw message