Hi

If you don't want either key or value in the output, just make the corresponding data types as NullWritable.

Since you just need to filter out a few records/itemd from your logs, reduce phase is not mandatory just a mappperwouldsuffice your needs. From your mapper just output the records that match your criteria. Also set number of reduce tasks to zero in your driver class tocompletelyavoid the reduce phase.

A sample code would look like

public static class Map extends
Mapper<LongWritable, Text, Text, NullWritable> {
private final static IntWritable one = new IntWritable(1);

public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
if(-1 != meetConditions(value)) {
context.write(value,NullWritable.get());
}
}
}


Om your driver class
job.setNumReduceTasks(0);

Alternatively you can specify this st runtime as
hadoop jar xyz.jar com.*.*.* D mapred.reduce.tasks=0 input/ output/

On Tue, Sep 25, 2012 at 11:38 PM, Matthieu Labour <matthieu@actionx.com> wrote:
Hi

I am completely new to Hadoop and I am trying to address the following simple application. I apologize if this sounds trivial.

I have multiple log files I need to read the log files and collect the entries that meet some conditions and write them back to files for further processing. ( On other words, I need to filter out some events)

I am using the WordCount example to get going.

public static class Map extends
Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);

public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
if(-1 != meetConditions(value)) {
context.write(value, one);
}
}
}

public static class Reduce extends
Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
context.write(key, new IntWritable(1));
}
}

The problem is that it prints the value 1 after each entry.

Hence my question. What is the best trivial implementation of the map and reduce function to address the use case above ?

Thank you greatly for your help