hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shahab Yunus <shahab.yu...@gmail.com>
Subject Re: Hbase Mapreduce API - Reduce to a file is not working properly.
Date Fri, 01 Aug 2014 22:00:33 GMT
Add @Override notation at top of the 'reduce' method and then try (just
like you are doing for the 'map' method):

public class WordCountReducer extends Reducer<Text, IntWritable, Text,
IntWritable> {

*        @Override*
        protected void reduce(Text key, Iterable<IntWritable> values,
                        org.apache.hadoop.mapreduce.Reducer.Context context)
                        throws IOException, InterruptedException {

...


Regards,
Shahab


On Fri, Aug 1, 2014 at 5:05 PM, Parkirat <parkiratbigdata@gmail.com> wrote:

> Thanks All for replying to my thread.
>
> I have further investigated the issue and found that hadoop is not
> running/respecting any reduce for my jobs ir-respective of if, it is normal
> mapreduce or hbase api of mapreduce.
>
> I am pasting word count example that I have run and the input and output
> file below for the reference. Please if anybody can find any issue in my
> code:
>
> *Job Config Class:*
> ================================================
> package com.test.hadoop;
>
> import java.io.IOException;
>
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.io.IntWritable;
> import org.apache.hadoop.io.Text;
> import org.apache.hadoop.mapreduce.Job;
> import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
> import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
>
> public class WordCountJob {
>
>         public static void main(String[] args) throws IOException,
> InterruptedException, ClassNotFoundException {
>
>                 if (args.length != 2) {
>                         System.out.println("usage: [inputdir]
> [outputdir]");
>                         System.exit(-1);
>                 }
>
>                 String inputdir = args[0].trim();
>                 String outputdir = args[1].trim();
>
>                 Configuration config = new Configuration();
>
>                 Job job = new Job(config, "Word Count");
>                 job.setJarByClass(WordCountMapper.class);
>
>                 FileInputFormat.setInputPaths(job, new Path(inputdir));
>                 FileOutputFormat.setOutputPath(job, new Path(outputdir));
>
>                 job.setMapperClass(WordCountMapper.class);
>                 job.setMapOutputKeyClass(Text.class);
>                 job.setMapOutputValueClass(IntWritable.class);
>
>                 job.setReducerClass(WordCountReducer.class);
>                 job.setOutputKeyClass(Text.class);
>                 job.setOutputValueClass(IntWritable.class);
>
>                 boolean b2 = job.waitForCompletion(true);
>                 if (!b2) {
>                         throw new IOException("error with job!");
>                 }
>         }
>
> }
> ================================================
>
> *Mapper Class:*
> ================================================
> package com.test.hadoop;
>
> import java.io.IOException;
> import java.util.StringTokenizer;
>
> import org.apache.hadoop.io.IntWritable;
> import org.apache.hadoop.io.Text;
> import org.apache.hadoop.mapreduce.Mapper;
>
> public class WordCountMapper extends Mapper<Object, Text, Text,
> IntWritable>
> {
>
>         private final static IntWritable one = new IntWritable(1);
>         private Text word = new Text();
>
>         @Override
>         protected void map(Object key, Text value,
>                         org.apache.hadoop.mapreduce.Mapper.Context context)
>                         throws IOException, InterruptedException {
>
>                 String line = value.toString();
>                 StringTokenizer tokenizer = new StringTokenizer(line);
>
>                 while (tokenizer.hasMoreTokens()) {
>                         word.set(tokenizer.nextToken());
>                         context.write(word, one);
>                 }
>         }
> }
> ================================================
>
> *Reducer Class:*
> ================================================
> package com.test.hadoop;
>
> import java.io.IOException;
>
> import org.apache.hadoop.io.IntWritable;
> import org.apache.hadoop.io.Text;
> import org.apache.hadoop.mapreduce.Reducer;
>
> public class WordCountReducer extends Reducer<Text, IntWritable, Text,
> IntWritable> {
>
>         protected void reduce(Text key, Iterable<IntWritable> values,
>                         org.apache.hadoop.mapreduce.Reducer.Context
> context)
>                         throws IOException, InterruptedException {
>
>                 int sum = 0;
>                 for (IntWritable val : values) {
>                         sum += val.get();
>                 }
>                 context.write(key, new IntWritable(sum));
>         }
> }
> ================================================
>
> *Input File:*
> ================================================
> -bash-4.1$ cat /tmp/testfile.txt
> This is an example to test Hadoop so as to test if this example works fine
> or not.
> ================================================
>
> *Mapreduce Console Output:*
> ================================================
> -bash-4.1$ hadoop jar /tmp/WordCount.jar com.test.hadoop.WordCountJob
> /tmp/wc/input /tmp/wc/output
> 14/08/01 20:52:19 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 14/08/01 20:52:19 INFO input.FileInputFormat: Total input paths to process
> :
> 1
> 14/08/01 20:52:19 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
> 14/08/01 20:52:19 INFO lzo.LzoCodec: Successfully loaded & initialized
> native-lzo library [hadoop-lzo rev
> cf4e7cbf8ed0f0622504d008101c2729dc0c9ff3]
> 14/08/01 20:52:19 WARN snappy.LoadSnappy: Snappy native library is
> available
> 14/08/01 20:52:19 INFO util.NativeCodeLoader: Loaded the native-hadoop
> library
> 14/08/01 20:52:19 INFO snappy.LoadSnappy: Snappy native library loaded
> 14/08/01 20:52:41 INFO mapred.JobClient: Running job: job_201404021234_0090
> 14/08/01 20:52:42 INFO mapred.JobClient:  map 0% reduce 0%
> 14/08/01 20:52:54 INFO mapred.JobClient:  map 100% reduce 0%
> 14/08/01 20:53:02 INFO mapred.JobClient:  map 100% reduce 33%
> 14/08/01 20:53:04 INFO mapred.JobClient:  map 100% reduce 100%
> 14/08/01 20:53:05 INFO mapred.JobClient: Job complete:
> job_201404021234_0090
> 14/08/01 20:53:05 INFO mapred.JobClient: Counters: 29
> 14/08/01 20:53:05 INFO mapred.JobClient:   Job Counters
> 14/08/01 20:53:05 INFO mapred.JobClient:     Launched reduce tasks=1
> 14/08/01 20:53:05 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=9171
> 14/08/01 20:53:05 INFO mapred.JobClient:     Total time spent by all
> reduces
> waiting after reserving slots (ms)=0
> 14/08/01 20:53:05 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
> 14/08/01 20:53:05 INFO mapred.JobClient:     Launched map tasks=1
> 14/08/01 20:53:05 INFO mapred.JobClient:     Data-local map tasks=1
> 14/08/01 20:53:05 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=9719
> 14/08/01 20:53:05 INFO mapred.JobClient:   File Output Format Counters
> 14/08/01 20:53:05 INFO mapred.JobClient:     Bytes Written=119
> 14/08/01 20:53:05 INFO mapred.JobClient:   FileSystemCounters
> 14/08/01 20:53:05 INFO mapred.JobClient:     FILE_BYTES_READ=197
> 14/08/01 20:53:05 INFO mapred.JobClient:     HDFS_BYTES_READ=214
> 14/08/01 20:53:05 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=112948
> 14/08/01 20:53:05 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=119
> 14/08/01 20:53:05 INFO mapred.JobClient:   File Input Format Counters
> 14/08/01 20:53:05 INFO mapred.JobClient:     Bytes Read=83
> 14/08/01 20:53:05 INFO mapred.JobClient:   Map-Reduce Framework
> 14/08/01 20:53:05 INFO mapred.JobClient:     Map output materialized
> bytes=197
> 14/08/01 20:53:05 INFO mapred.JobClient:     Map input records=1
> 14/08/01 20:53:05 INFO mapred.JobClient:     Reduce shuffle bytes=197
> 14/08/01 20:53:05 INFO mapred.JobClient:     Spilled Records=36
> 14/08/01 20:53:05 INFO mapred.JobClient:     Map output bytes=155
> 14/08/01 20:53:05 INFO mapred.JobClient:     CPU time spent (ms)=2770
> 14/08/01 20:53:05 INFO mapred.JobClient:     Total committed heap usage
> (bytes)=398393344
> 14/08/01 20:53:05 INFO mapred.JobClient:     Combine input records=0
> 14/08/01 20:53:05 INFO mapred.JobClient:     SPLIT_RAW_BYTES=131
> 14/08/01 20:53:05 INFO mapred.JobClient:     Reduce input records=18
> 14/08/01 20:53:05 INFO mapred.JobClient:     Reduce input groups=15
> 14/08/01 20:53:05 INFO mapred.JobClient:     Combine output records=0
> 14/08/01 20:53:05 INFO mapred.JobClient:     Physical memory (bytes)
> snapshot=385605632
> 14/08/01 20:53:05 INFO mapred.JobClient:     Reduce output records=18
> 14/08/01 20:53:05 INFO mapred.JobClient:     Virtual memory (bytes)
> snapshot=2707595264
> 14/08/01 20:53:05 INFO mapred.JobClient:     Map output records=18
> ================================================
>
> *Generated Output File:*
> ================================================
> -bash-4.1$ hadoop fs -tail /tmp/wc/output/part-r-00000
> Hadoop  1
> This    1
> an      1
> as      1
> example 1
> example 1
> fine    1
> if      1
> is      1
> not.    1
> or      1
> so      1
> test    1
> test    1
> this    1
> to      1
> to      1
> works   1
> ================================================
>
> Regards,
> Parkirat Bagga
>
>
>
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/Hbase-Mapreduce-API-Reduce-to-a-file-is-not-working-properly-tp4062141p4062222.html
> Sent from the HBase User mailing list archive at Nabble.com.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message