hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bejoy Ks <bejoy.had...@gmail.com>
Subject Re: Help on a Simple program
Date Tue, 25 Sep 2012 18:48:42 GMT
Hi

If you don't want either key or value in the output, just make the
corresponding data types as NullWritable.

Since you just need to filter out a few records/itemd from your logs,
reduce phase is not mandatory just a mappper would suffice your needs. From
your mapper just output the records that match your criteria. Also set
number of reduce tasks to zero in your driver class to completely avoid the
reduce phase.

A sample code would look like

public static class Map extends
            Mapper<LongWritable, Text, Text, NullWritable> {
        private final static IntWritable one = new IntWritable(1);

        public void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
            if(-1 != meetConditions(value)) {
                context.write(value, NullWritable.*get*());
            }
        }
    }


Om your driver class
*job.setNumReduceTasks(0);*
*
*
*Alternatively you can specify this st runtime as*
hadoop jar xyz.jar com.*.*.* –D mapred.reduce.tasks=0 input/ output/

On Tue, Sep 25, 2012 at 11:38 PM, Matthieu Labour <matthieu@actionx.com>wrote:

> Hi
>
> I am completely new to Hadoop and I am trying to address the following
> simple application. I apologize if this sounds trivial.
>
> I have multiple log files I need to read the log files and collect the
> entries that meet some conditions and write them back to files for further
> processing. ( On other words, I need to filter out some events)
>
> I am using the WordCount example to get going.
>
> public static class Map extends
>             Mapper<LongWritable, Text, Text, IntWritable> {
>         private final static IntWritable one = new IntWritable(1);
>
>         public void map(LongWritable key, Text value, Context context)
>                 throws IOException, InterruptedException {
>             if(-1 != meetConditions(value)) {
>                 context.write(value, one);
>             }
>         }
>     }
>
> public static class Reduce extends
>             Reducer<Text, IntWritable, Text, IntWritable> {
>
>         public void reduce(Text key, Iterable<IntWritable> values,
>                 Context context) throws IOException, InterruptedException {
>             context.write(key, new IntWritable(1));
>         }
>     }
>
> The problem is that it prints the value 1 after each entry.
>
> Hence my question. What is the best trivial implementation of the map and
> reduce function to address the use case above ?
>
> Thank you greatly for your help
>

Mime
View raw message