hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shi Yu <sh...@uchicago.edu>
Subject Automatic line number in reducer output
Date Tue, 07 Jun 2011 16:21:37 GMT

I am wondering is there any built-in function to automatically add a 
self-increment line number in reducer output (like the relation DB 

I have this problem because in 0.19.2 API, I used a variable linecount 
increasing in the reducer like:

  public static class Reduce extends MapReduceBase implements 
Reducer<Text, IntWritable, Text,IntWritable>{
         private long linecount = 0;

         public void reduce(Text key, Iterator<IntWritable> values, 
OutputCollector<Text, IntWritable> output, Reporter reporter) throws 
IOException {

         //.....some code here
         linecount ++;
         output.collect(new Text(Long.toString(linecount)), var);



However, I found that this is not working in 0.20.2 API, if I write the 
code like:

public static class Reduce extends 
org.apache.hadoop.mapreduce.Reducer<Text, IntWritable, Text, IntWritable>{
        private long linecount = 0;

        public void reduce (Text key, Iterator<IntWritable> values, 
org.apache.hadoop.mapreduce.Reducer.Context context) throws IOException, 
InterruptedException {

        //some code here
        linecount ++;
        context.write(new Text(Long.toString(linecount)),var);

but it seems not working anymore.

I would also like to know if there are combiner and reducer implemented, 
how to avoid that line number being written twice (cause I only want it 
in reducer, not in combiner). Thanks!


View raw message