Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9D3A4DABF for ; Tue, 25 Sep 2012 18:49:13 +0000 (UTC) Received: (qmail 314 invoked by uid 500); 25 Sep 2012 18:49:08 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 224 invoked by uid 500); 25 Sep 2012 18:49:08 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 214 invoked by uid 99); 25 Sep 2012 18:49:08 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Sep 2012 18:49:08 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of bejoy.hadoop@gmail.com designates 209.85.216.176 as permitted sender) Received: from [209.85.216.176] (HELO mail-qc0-f176.google.com) (209.85.216.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Sep 2012 18:49:03 +0000 Received: by qcon41 with SMTP id n41so3790028qco.35 for ; Tue, 25 Sep 2012 11:48:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=DTBwYgTmdRXN7/kdy4tGp+rgZopjl6gTvJGZ7Y8UzeU=; b=hjBeZSLHkC62gljU7o9Ofq6YQMPgo5QpfRB6od6V9kuWK3SvSBfNLkvOlsADhhsuQ9 BvEFQU4j4/Vmf/o7azvggSt36KbuWlB3jl+4Aa+ok32ndWfsJqW3dmB/wmAu/yPwyRQ1 8DjNbFyKuwtvQZo6kFKs6RL1FU3krapUXnrDXF3EPTyNKV0HzMsgr8DII8K2EIwNRuHs HTt66EDvFQ7vidktmPfbVL4w1jGlNCGj+ynUJ8SkpdAkLARZHzcKP57PNuT2I9KjqOfI v5x+WhaGzdSb14HpQwtyAXUaD4LDyYZQNCq+UmJ3HPgACd1g+aFRieamEe0RppxZ7iaf SZtQ== MIME-Version: 1.0 Received: by 10.224.209.8 with SMTP id ge8mr42046089qab.0.1348598922938; Tue, 25 Sep 2012 11:48:42 -0700 (PDT) Received: by 10.49.49.129 with HTTP; Tue, 25 Sep 2012 11:48:42 -0700 (PDT) In-Reply-To: References: Date: Wed, 26 Sep 2012 00:18:42 +0530 Message-ID: Subject: Re: Help on a Simple program From: Bejoy Ks To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=20cf300faf41bf085004ca8b27bc X-Virus-Checked: Checked by ClamAV on apache.org --20cf300faf41bf085004ca8b27bc Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi If you don't want either key or value in the output, just make the corresponding data types as NullWritable. Since you just need to filter out a few records/itemd from your logs, reduce phase is not mandatory just a mappper would suffice your needs. From your mapper just output the records that match your criteria. Also set number of reduce tasks to zero in your driver class to completely avoid the reduce phase. A sample code would look like public static class Map extends Mapper { private final static IntWritable one =3D new IntWritable(1); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { if(-1 !=3D meetConditions(value)) { context.write(value, NullWritable.*get*()); } } } Om your driver class *job.setNumReduceTasks(0);* * * *Alternatively you can specify this st runtime as* hadoop jar xyz.jar com.*.*.* =96D mapred.reduce.tasks=3D0 input/ output/ On Tue, Sep 25, 2012 at 11:38 PM, Matthieu Labour wro= te: > Hi > > I am completely new to Hadoop and I am trying to address the following > simple application. I apologize if this sounds trivial. > > I have multiple log files I need to read the log files and collect the > entries that meet some conditions and write them back to files for furthe= r > processing. ( On other words, I need to filter out some events) > > I am using the WordCount example to get going. > > public static class Map extends > Mapper { > private final static IntWritable one =3D new IntWritable(1); > > public void map(LongWritable key, Text value, Context context) > throws IOException, InterruptedException { > if(-1 !=3D meetConditions(value)) { > context.write(value, one); > } > } > } > > public static class Reduce extends > Reducer { > > public void reduce(Text key, Iterable values, > Context context) throws IOException, InterruptedException= { > context.write(key, new IntWritable(1)); > } > } > > The problem is that it prints the value 1 after each entry. > > Hence my question. What is the best trivial implementation of the map and > reduce function to address the use case above ? > > Thank you greatly for your help > --20cf300faf41bf085004ca8b27bc Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi

If you don't want either key or value in the outp= ut, just make the corresponding data types as NullWritable.

<= /div>
Since you just need to filter out a few records/itemd from your l= ogs, reduce phase is not mandatory just a mappper=A0would=A0suffice your ne= eds. From your mapper just output the records that match your criteria. Als= o set number of reduce tasks to zero in your driver class to=A0completely= =A0avoid the reduce phase.

A sample code would look like

= public static class Map extends
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 Mapper<= LongWritable, Text, Text, NullWritable> {
=A0=A0=A0 =A0=A0=A0 private= final static IntWritable one =3D new IntWritable(1);

=A0=A0=A0 =A0=A0=A0 public void map(LongWritable key, Text value, Conte= xt context)
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 throws IOException, = InterruptedException {=A0=A0=A0 =A0=A0=A0=A0
=A0=A0=A0 =A0=A0=A0 =A0=A0= =A0 if(-1 !=3D meetConditions(value)) {
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 = =A0=A0=A0 context.write(value,=A0NullWritable.get());
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 }
=A0=A0=A0 =A0=A0=A0 }
=A0=A0=A0 }


Om your driver class
= job.setNumReduceTasks(0);

Alternativ= ely you can specify this st runtime as
hadoop jar xyz.jar com.*.*.* =96D mapred.reduce.tasks=3D0 input/ output/

On Tue, Sep 25, 2012 at 11:38 PM, Matth= ieu Labour <matthieu@actionx.com> wrote:
Hi

I am completely new to Hadoop and I am trying to address the foll= owing simple application. I apologize if this sounds trivial.

I hav= e multiple log files I need to read the log files and collect the entries t= hat meet some conditions and write them back to files for further processin= g. ( On other words, I need to filter out some events)

I am using the WordCount example to get going.

public static cla= ss Map extends
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 Mapper<LongWritable, Tex= t, Text, IntWritable> {
=A0=A0=A0 =A0=A0=A0 private final static IntW= ritable one =3D new IntWritable(1);

=A0=A0=A0 =A0=A0=A0 public void map(LongWritable key, Text value, Conte= xt context)
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 throws IOException, = InterruptedException {=A0=A0=A0 =A0=A0=A0
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0= if(-1 !=3D meetConditions(value)) {
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0= =A0=A0 context.write(value, one);
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 }
=A0=A0=A0 =A0=A0=A0 }
=A0=A0=A0 }
=
public static class Reduce extends
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 Red= ucer<Text, IntWritable, Text, IntWritable> {

=A0=A0=A0 =A0=A0= =A0 public void reduce(Text key, Iterable<IntWritable> values,
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 Context context) throws IOException= , InterruptedException {
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 context.write(key= , new IntWritable(1));
=A0=A0=A0 =A0=A0=A0 }
=A0=A0=A0 }

The p= roblem is that it prints the value 1 after each entry.

Hence my question. What is the best trivial implementation of the map a= nd reduce function to address the use case above ?

Thank you greatly= for your help

--20cf300faf41bf085004ca8b27bc--