hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthieu Labour <matth...@actionx.com>
Subject Help on a Simple program
Date Tue, 25 Sep 2012 18:08:06 GMT

I am completely new to Hadoop and I am trying to address the following
simple application. I apologize if this sounds trivial.

I have multiple log files I need to read the log files and collect the
entries that meet some conditions and write them back to files for further
processing. ( On other words, I need to filter out some events)

I am using the WordCount example to get going.

public static class Map extends
            Mapper<LongWritable, Text, Text, IntWritable> {
        private final static IntWritable one = new IntWritable(1);

        public void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
            if(-1 != meetConditions(value)) {
                context.write(value, one);

public static class Reduce extends
            Reducer<Text, IntWritable, Text, IntWritable> {

        public void reduce(Text key, Iterable<IntWritable> values,
                Context context) throws IOException, InterruptedException {
            context.write(key, new IntWritable(1));

The problem is that it prints the value 1 after each entry.

Hence my question. What is the best trivial implementation of the map and
reduce function to address the use case above ?

Thank you greatly for your help

View raw message