hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From HU Wenjing A <Wenjing.a...@alcatel-sbell.com.cn>
Subject hadoop streaming using java as mapper & reducer
Date Mon, 28 May 2012 06:50:27 GMT
Hi all,

I am a new learner of hadoop, and recently I want to use hadoop streaming to run java program
as mapper and reducer.
(because I want to use hadoop streaming to transplant some existing java programs to process
xml file).
  To have a try, first I use the hadoop wordcount example (as follows):

     Countm.java:
     import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class countm extends Mapper<LongWritable, Text, Text, IntWritable> {
         private final static IntWritable one = new IntWritable(1);
         private Text word = new Text();

         public void map(LongWritable key, Text value, Context context) throws IOException,
InterruptedException {
           String line = value.toString();
           StringTokenizer tokenizer = new StringTokenizer(line);
           while (tokenizer.hasMoreTokens()) {
                word.set(tokenizer.nextToken());
                context.write(word, one);
                }
             }
 }


    Countm.java:
 import java.io.IOException;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;

 public class countr extends Reducer<Text, IntWritable, Text, IntWritable> {
         public void reduce(Text key, Iterable<IntWritable> values, Context context)
           throws IOException, InterruptedException {
               int sum = 0;
               for (IntWritable val : values) {
                  sum += val.get();
               }
             context.write(key, new IntWritable(sum));
           }
        }

And I execute the following command :
bin/hadoop jar mapred/contrib/streaming/hadoop-0.21.0-streaming.jar -files test/countr.class
-files test/countm.class -mapper test/countm.class  -reducer test/countr.class  -input input
-output output
then I got the following information:
12/05/27 16:48:29 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
cacheTimeout=300000
12/05/27 16:48:30 WARN conf.Configuration: mapred.used.genericoptionsparser is deprecated.
Instead, use mapreduce.client.genericoptionsparser.used
12/05/27 16:48:30 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
packageJobJar: [/root/hadoop-0.21.0/tmp/hadoop-unjar7827631350430711602/] [] /tmp/streamjob2931278447922230194.jar
tmpDir=null
12/05/27 16:48:31 INFO mapred.FileInputFormat: Total input paths to process : 1
12/05/27 16:48:31 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
12/05/27 16:48:31 INFO mapreduce.JobSubmitter: number of splits:19
12/05/27 16:48:31 INFO mapreduce.JobSubmitter: adding the following namenodes' delegation
tokens:null
12/05/27 16:48:31 INFO streaming.StreamJob: getLocalDirs(): [/root/hadoop-0.21.0/tmp/mapred/local]
12/05/27 16:48:31 INFO streaming.StreamJob: Running job: job_201205242131_0013
12/05/27 16:48:31 INFO streaming.StreamJob: To kill this job, run:
12/05/27 16:48:31 INFO streaming.StreamJob: /root/hadoop-0.21.0/bin/../bin/hadoop job  -Dmapreduce.jobtracker.address=192.168.204.130:9001
-kill job_201205242131_0013
12/05/27 16:48:31 INFO streaming.StreamJob: Tracking URL: http://master:50030/jobdetails.jsp?jobid=job_201205242131_0013
12/05/27 16:48:32 INFO streaming.StreamJob:  map 0%  reduce 0%
12/05/27 16:49:59 INFO streaming.StreamJob:  map 100%  reduce 100%
12/05/27 16:49:59 INFO streaming.StreamJob: To kill this job, run:
12/05/27 16:49:59 INFO streaming.StreamJob: /root/hadoop-0.21.0/bin/../bin/hadoop job  -Dmapreduce.jobtracker.address=192.168.204.130:9001
-kill job_201205242131_0013
12/05/27 16:49:59 INFO streaming.StreamJob: Tracking URL: http://master:50030/jobdetails.jsp?jobid=job_201205242131_0013
12/05/27 16:49:59 ERROR streaming.StreamJob: Job not Successful!
12/05/27 16:49:59 INFO streaming.StreamJob: killJob...
Streaming Command Failed!


I am not sure whether I use it in the wrong way or there is something special need to be done
for using java in hadoop streaming?
It would be great if you can help me with it as I cannot move any further without overcoming
this.
Thanking you in anticipation.  : )

Thanks & best regards,
wenjing

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message