hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Hammerton <james.hammer...@mendeley.com>
Subject Re: JobClient using deprecated JobConf
Date Fri, 24 Sep 2010 10:33:32 GMT
Hi,

That tutorial includes java source code that submits a job. Look at what
main() and run() are doing. Or are you trying to avoid using the "hadoop"
command? Surely all you need to do with your java app once written is run it
via the "hadoop" command rather than via the "java" command?

James

On Thu, Sep 23, 2010 at 6:22 PM, Martin Becker <_martinbecker@web.de> wrote:

>  Well, the tutorial let's me know how to use the command line interface.
> That does work fine. Implementing the Tool interface and all. By scanning
> through this tutorial roughly I cannot find any way of actually submitting a
> job _not_ using the command line interface. I want a java application to
> submit a job, without having to call any script files. Can you give me a
> pointer?
>
> Martin
>
>
> On 23.09.2010 18:54, Tom White wrote:
>
>> This tutorial should help:
>> http://hadoop.apache.org/mapreduce/docs/r0.21.0/mapred_tutorial.html
>>
>> Tom
>>
>> On Thu, Sep 23, 2010 at 1:24 AM, Martin Becker<_martinbecker@web.de>
>>  wrote:
>>
>>> Hi,
>>> I would still like to use the new API. So what I am trying to do now is
>>> to
>>> not use the command line interface to submit a job, but do it from Java
>>> code. How do I do this? This is what I do at the moment:
>>> * Clean start up of Hadoop (formatted file system and all)
>>> * Using the standard WordCount Mapper and Reducer I wrote this main
>>> method:
>>>
>>>     public static void main(String[] args) throws IOException,
>>>         InterruptedException, ClassNotFoundException {
>>>
>>>     Configuration configuration = new Configuration();
>>>     InetSocketAddress socket = new InetSocketAddress("localhost", 9001);
>>>     Cluster cluster = new Cluster(socket, configuration);
>>>
>>>     FileSystem fs = cluster.getFileSystem();
>>>     Path homeDirectory = fs.getHomeDirectory();
>>>
>>>     Path input = new Path(homeDirectory, INPUT);
>>>     Path output = new Path(homeDirectory, OUTPUT);
>>>
>>>     fs.delete(output, true);
>>>     fs.copyFromLocalFile(new
>>> Path("resources/test/wordcount/data/ipsum.txt"), new Path(input,
>>> "input.txt"));
>>>
>>>     Job job = Job.getInstance(cluster);
>>>
>>> //1    job.addArchiveToClassPath(new Path("release/test.jar"));
>>>
>>> //2    job.addFileToClassPath(new
>>> Path("bin/de/fstyle/hadoop/tutorial/wordcount/WordCount.class"));
>>> //    job.addFileToClassPath(new
>>> Path("bin/de/fstyle/hadoop/tutorial/wordcount/WordCount$Map.class"));
>>> //    job.addFileToClassPath(new
>>> Path("bin/de/fstyle/hadoop/tutorial/wordcount/WordCount$Reduce.class"));
>>>
>>>     job.setJarByClass(WordCount.class);
>>>     job.setMapperClass(Map.class);
>>>     job.setCombinerClass(Reduce.class);
>>>     job.setReducerClass(Reduce.class);
>>>     job.setOutputKeyClass(Text.class);
>>>     job.setOutputValueClass(IntWritable.class);
>>>     FileInputFormat.addInputPath(job, input);
>>>     FileOutputFormat.setOutputPath(job, output);
>>>
>>>     System.exit(job.waitForCompletion(true) ? 0 : 1);
>>>
>>>     }
>>> * I tried to run this code as is in Eclipse.
>>> * Obviously, I guess, Hadoop needed the WordClass classes to work so I
>>> got
>>> this error:
>>> java.lang.RuntimeException: java.lang.ClassNotFoundException:
>>> de.fstyle.hadoop.tutorial.wordcount.WordCount$Map
>>> * Putting everything into a jar and adding the following line did not do
>>> any
>>> good:
>>> job.addArchiveToClassPath(new Path("release/test.jar"));
>>> * Adding each class separately throws the same exception:
>>>    job.addFileToClassPath(new
>>> Path("bin/de/fstyle/hadoop/tutorial/wordcount/WordCount.class"));
>>>    job.addFileToClassPath(new
>>> Path("bin/de/fstyle/hadoop/tutorial/wordcount/WordCount$Map.class"));
>>>    job.addFileToClassPath(new
>>> Path("bin/de/fstyle/hadoop/tutorial/wordcount/WordCount$Reduce.class"));
>>> * Using
>>> job.setJar("release/test.jar");
>>> Will get me
>>> java.io.FileNotFoundException: File
>>>
>>> /tmp/hadoop-martin/mapred/staging/martin/.staging/job_201009221802_0033/job.jar
>>> does not exist.
>>>
>>> So how would I set this up/use oi correctly? Sorry, I did not find any
>>> tutorial or examples anywhere.
>>>
>>> Martin
>>>
>>>
>>> On 22.09.2010 18:29, Tom White wrote:
>>>
>>> Note that JobClient, along with the rest of the "old" API in
>>> org.apache.hadoop.mapred, has been undeprecated in Hadoop 0.21.0 so
>>> you can continue to use it without warnings.
>>>
>>> Tom
>>>
>>> On Wed, Sep 22, 2010 at 2:43 AM, Amareshwari Sri Ramadasu
>>> <amarsri@yahoo-inc.com>  wrote:
>>>
>>> In 0.21, JobClient methods are available in
>>> org.apache.hadoop.mapreduce.Job
>>> and org.apache.hadoop.mapreduce.Cluster classes.
>>>
>>> On 9/22/10 3:07 PM, "Martin Becker"<_martinbecker@web.de>  wrote:
>>>
>>>  Hello,
>>>
>>> I am using the Hadoop MapReduce version 0.20.2 and soon 0.21.
>>> I wanted to use the JobClient class to circumvent the use of the command
>>> line interface.
>>> I am noticed that JobClient still uses the deprecated JobConf class for
>>> jib submissions.
>>> Are there any alternatives to JobClient not using the deprecated JobConf
>>> class?
>>>
>>> Thanks in advance,
>>> Martin
>>>
>>>
>>>
>>>
>>>
>>>
>


-- 
James Hammerton | Senior Data Mining Engineer
www.mendeley.com/profiles/james-hammerton

Mendeley Limited | London, UK | www.mendeley.com
Registered in England and Wales | Company Number 6419015

Mime
View raw message