hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Parkirat <parkiratbigd...@gmail.com>
Subject Re: Hbase Mapreduce API - Reduce to a file is not working properly.
Date Fri, 01 Aug 2014 21:05:00 GMT
Thanks All for replying to my thread.

I have further investigated the issue and found that hadoop is not
running/respecting any reduce for my jobs ir-respective of if, it is normal
mapreduce or hbase api of mapreduce.

I am pasting word count example that I have run and the input and output
file below for the reference. Please if anybody can find any issue in my
code:

*Job Config Class:*
================================================
package com.test.hadoop;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCountJob {
	
	public static void main(String[] args) throws IOException,
InterruptedException, ClassNotFoundException {
		
		if (args.length != 2) {
			System.out.println("usage: [inputdir] [outputdir]");
			System.exit(-1);
		}
		
		String inputdir = args[0].trim();
		String outputdir = args[1].trim();
		
		Configuration config = new Configuration();
		
		Job job = new Job(config, "Word Count");
		job.setJarByClass(WordCountMapper.class);
		
		FileInputFormat.setInputPaths(job, new Path(inputdir));
		FileOutputFormat.setOutputPath(job, new Path(outputdir));
		
		job.setMapperClass(WordCountMapper.class);
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(IntWritable.class);
		
		job.setReducerClass(WordCountReducer.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);
		
		boolean b2 = job.waitForCompletion(true);
		if (!b2) {
			throw new IOException("error with job!");
		}
	}

}
================================================

*Mapper Class:*
================================================
package com.test.hadoop;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class WordCountMapper extends Mapper<Object, Text, Text, IntWritable>
{

	private final static IntWritable one = new IntWritable(1);
	private Text word = new Text();

	@Override
	protected void map(Object key, Text value,
			org.apache.hadoop.mapreduce.Mapper.Context context)
			throws IOException, InterruptedException {
		
		String line = value.toString();
		StringTokenizer tokenizer = new StringTokenizer(line);
		
		while (tokenizer.hasMoreTokens()) {
			word.set(tokenizer.nextToken());
			context.write(word, one);
		}
	}
}
================================================

*Reducer Class:*
================================================
package com.test.hadoop;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class WordCountReducer extends Reducer<Text, IntWritable, Text,
IntWritable> {

	protected void reduce(Text key, Iterable<IntWritable> values,
			org.apache.hadoop.mapreduce.Reducer.Context context)
			throws IOException, InterruptedException {
		
		int sum = 0;
		for (IntWritable val : values) {
			sum += val.get();
		}
		context.write(key, new IntWritable(sum));
	}
}
================================================

*Input File:*
================================================
-bash-4.1$ cat /tmp/testfile.txt
This is an example to test Hadoop so as to test if this example works fine
or not.
================================================

*Mapreduce Console Output:*
================================================
-bash-4.1$ hadoop jar /tmp/WordCount.jar com.test.hadoop.WordCountJob
/tmp/wc/input /tmp/wc/output
14/08/01 20:52:19 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
14/08/01 20:52:19 INFO input.FileInputFormat: Total input paths to process :
1
14/08/01 20:52:19 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
14/08/01 20:52:19 INFO lzo.LzoCodec: Successfully loaded & initialized
native-lzo library [hadoop-lzo rev cf4e7cbf8ed0f0622504d008101c2729dc0c9ff3]
14/08/01 20:52:19 WARN snappy.LoadSnappy: Snappy native library is available
14/08/01 20:52:19 INFO util.NativeCodeLoader: Loaded the native-hadoop
library
14/08/01 20:52:19 INFO snappy.LoadSnappy: Snappy native library loaded
14/08/01 20:52:41 INFO mapred.JobClient: Running job: job_201404021234_0090
14/08/01 20:52:42 INFO mapred.JobClient:  map 0% reduce 0%
14/08/01 20:52:54 INFO mapred.JobClient:  map 100% reduce 0%
14/08/01 20:53:02 INFO mapred.JobClient:  map 100% reduce 33%
14/08/01 20:53:04 INFO mapred.JobClient:  map 100% reduce 100%
14/08/01 20:53:05 INFO mapred.JobClient: Job complete: job_201404021234_0090
14/08/01 20:53:05 INFO mapred.JobClient: Counters: 29
14/08/01 20:53:05 INFO mapred.JobClient:   Job Counters
14/08/01 20:53:05 INFO mapred.JobClient:     Launched reduce tasks=1
14/08/01 20:53:05 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=9171
14/08/01 20:53:05 INFO mapred.JobClient:     Total time spent by all reduces
waiting after reserving slots (ms)=0
14/08/01 20:53:05 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
14/08/01 20:53:05 INFO mapred.JobClient:     Launched map tasks=1
14/08/01 20:53:05 INFO mapred.JobClient:     Data-local map tasks=1
14/08/01 20:53:05 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=9719
14/08/01 20:53:05 INFO mapred.JobClient:   File Output Format Counters
14/08/01 20:53:05 INFO mapred.JobClient:     Bytes Written=119
14/08/01 20:53:05 INFO mapred.JobClient:   FileSystemCounters
14/08/01 20:53:05 INFO mapred.JobClient:     FILE_BYTES_READ=197
14/08/01 20:53:05 INFO mapred.JobClient:     HDFS_BYTES_READ=214
14/08/01 20:53:05 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=112948
14/08/01 20:53:05 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=119
14/08/01 20:53:05 INFO mapred.JobClient:   File Input Format Counters
14/08/01 20:53:05 INFO mapred.JobClient:     Bytes Read=83
14/08/01 20:53:05 INFO mapred.JobClient:   Map-Reduce Framework
14/08/01 20:53:05 INFO mapred.JobClient:     Map output materialized
bytes=197
14/08/01 20:53:05 INFO mapred.JobClient:     Map input records=1
14/08/01 20:53:05 INFO mapred.JobClient:     Reduce shuffle bytes=197
14/08/01 20:53:05 INFO mapred.JobClient:     Spilled Records=36
14/08/01 20:53:05 INFO mapred.JobClient:     Map output bytes=155
14/08/01 20:53:05 INFO mapred.JobClient:     CPU time spent (ms)=2770
14/08/01 20:53:05 INFO mapred.JobClient:     Total committed heap usage
(bytes)=398393344
14/08/01 20:53:05 INFO mapred.JobClient:     Combine input records=0
14/08/01 20:53:05 INFO mapred.JobClient:     SPLIT_RAW_BYTES=131
14/08/01 20:53:05 INFO mapred.JobClient:     Reduce input records=18
14/08/01 20:53:05 INFO mapred.JobClient:     Reduce input groups=15
14/08/01 20:53:05 INFO mapred.JobClient:     Combine output records=0
14/08/01 20:53:05 INFO mapred.JobClient:     Physical memory (bytes)
snapshot=385605632
14/08/01 20:53:05 INFO mapred.JobClient:     Reduce output records=18
14/08/01 20:53:05 INFO mapred.JobClient:     Virtual memory (bytes)
snapshot=2707595264
14/08/01 20:53:05 INFO mapred.JobClient:     Map output records=18
================================================

*Generated Output File:*
================================================
-bash-4.1$ hadoop fs -tail /tmp/wc/output/part-r-00000
Hadoop	1
This	1
an	1
as	1
example	1
example	1
fine	1
if	1
is	1
not.	1
or	1
so	1
test	1
test	1
this	1
to	1
to	1
works	1
================================================

Regards,
Parkirat Bagga




--
View this message in context: http://apache-hbase.679495.n3.nabble.com/Hbase-Mapreduce-API-Reduce-to-a-file-is-not-working-properly-tp4062141p4062222.html
Sent from the HBase User mailing list archive at Nabble.com.

Mime
View raw message