Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of bejoy.hadoop@gmail.com
 designates 209.85.216.176 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAPxQDUid4nGh5-h4vWNYUrE005xZ=C5AjAL0PEdOsovWNEfCOg@mail.gmail.com>
References: 
 <CAPxQDUid4nGh5-h4vWNYUrE005xZ=C5AjAL0PEdOsovWNEfCOg@mail.gmail.com>
Date: Wed, 26 Sep 2012 00:18:42 +0530
Message-ID: 
 <CACD21EOjToxHGJCTi5dNJQEhr_vErNTspwHu4ZjZHEn_SOHmBA@mail.gmail.com>
Subject: Re: Help on a Simple program
From: Bejoy Ks <bejoy.hadoop@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=20cf300faf41bf085004ca8b27bc

--20cf300faf41bf085004ca8b27bc
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Hi

If you don't want either key or value in the output, just make the
corresponding data types as NullWritable.

Since you just need to filter out a few records/itemd from your logs,
reduce phase is not mandatory just a mappper would suffice your needs. From
your mapper just output the records that match your criteria. Also set
number of reduce tasks to zero in your driver class to completely avoid the
reduce phase.

A sample code would look like

public static class Map extends
            Mapper<LongWritable, Text, Text, NullWritable> {
        private final static IntWritable one =3D new IntWritable(1);

        public void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
            if(-1 !=3D meetConditions(value)) {
                context.write(value, NullWritable.*get*());
            }
        }
    }


Om your driver class
*job.setNumReduceTasks(0);*
*
*
*Alternatively you can specify this st runtime as*
hadoop jar xyz.jar com.*.*.* =96D mapred.reduce.tasks=3D0 input/ output/

On Tue, Sep 25, 2012 at 11:38 PM, Matthieu Labour <matthieu@actionx.com>wro=
te:

> Hi
>
> I am completely new to Hadoop and I am trying to address the following
> simple application. I apologize if this sounds trivial.
>
> I have multiple log files I need to read the log files and collect the
> entries that meet some conditions and write them back to files for furthe=
r
> processing. ( On other words, I need to filter out some events)
>
> I am using the WordCount example to get going.
>
> public static class Map extends
>             Mapper<LongWritable, Text, Text, IntWritable> {
>         private final static IntWritable one =3D new IntWritable(1);
>
>         public void map(LongWritable key, Text value, Context context)
>                 throws IOException, InterruptedException {
>             if(-1 !=3D meetConditions(value)) {
>                 context.write(value, one);
>             }
>         }
>     }
>
> public static class Reduce extends
>             Reducer<Text, IntWritable, Text, IntWritable> {
>
>         public void reduce(Text key, Iterable<IntWritable> values,
>                 Context context) throws IOException, InterruptedException=
 {
>             context.write(key, new IntWritable(1));
>         }
>     }
>
> The problem is that it prints the value 1 after each entry.
>
> Hence my question. What is the best trivial implementation of the map and
> reduce function to address the use case above ?
>
> Thank you greatly for your help
>

--20cf300faf41bf085004ca8b27bc
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Hi<div><br></div><div>If you don&#39;t want either key or value in the outp=
ut, just make the corresponding data types as NullWritable.</div><div><br><=
/div><div>Since you just need to filter out a few records/itemd from your l=
ogs, reduce phase is not mandatory just a mappper=A0would=A0suffice your ne=
eds. From your mapper just output the records that match your criteria. Als=
o set number of reduce tasks to zero in your driver class to=A0completely=
=A0avoid the reduce phase.</div>
<div><br></div><div>A sample code would look like</div><div><br></div><div>=
public static class Map extends<br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 Mapper&lt;=
LongWritable, Text, Text, NullWritable&gt; {<br>=A0=A0=A0 =A0=A0=A0 private=
 final static IntWritable one =3D new IntWritable(1);<br>
<br>=A0=A0=A0 =A0=A0=A0 public void map(LongWritable key, Text value, Conte=
xt context)<br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 throws IOException, =
InterruptedException {=A0=A0=A0 =A0=A0=A0=A0<br>=A0=A0=A0 =A0=A0=A0 =A0=A0=
=A0 if(-1 !=3D meetConditions(value)) {<br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =
=A0=A0=A0 context.write(value,=A0<span style=3D"font-family:&#39;Courier Ne=
w&#39;;font-size:10pt">NullWritable.<i>get</i>()</span>);<br>
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 }<br>=A0=A0=A0 =A0=A0=A0 }<br>=A0=A0=A0 }
</div><div><br></div><div><br></div><div>Om your driver class</div><div><i>=
job.setNumReduceTasks(0);</i></div><div><i><br></i></div><div><i>Alternativ=
ely you can specify this st runtime as</i></div><div>hadoop jar <span style=
=3D"color:#943634">xyz.jar</span> com.*.*.* =96D mapred.reduce.tasks=3D0 <s=
pan style=3D"color:#548dd4">input/</span> <span style=3D"color:rgb(84,141,2=
12)">output/</span></div>
<div><br><div class=3D"gmail_quote">On Tue, Sep 25, 2012 at 11:38 PM, Matth=
ieu Labour <span dir=3D"ltr">&lt;<a href=3D"mailto:matthieu@actionx.com" ta=
rget=3D"_blank">matthieu@actionx.com</a>&gt;</span> wrote:<br><blockquote c=
lass=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;=
padding-left:1ex">
Hi<br><br>I am completely new to Hadoop and I am trying to address the foll=
owing simple application. I apologize if this sounds trivial. <br><br>I hav=
e multiple log files I need to read the log files and collect the entries t=
hat meet some conditions and write them back to files for further processin=
g. ( On other words, I need to filter out some events)<br>

<br>I am using the WordCount example to get going.<br><br>public static cla=
ss Map extends<br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 Mapper&lt;LongWritable, Tex=
t, Text, IntWritable&gt; {<br>=A0=A0=A0 =A0=A0=A0 private final static IntW=
ritable one =3D new IntWritable(1);<br>

<br>=A0=A0=A0 =A0=A0=A0 public void map(LongWritable key, Text value, Conte=
xt context)<br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 throws IOException, =
InterruptedException {=A0=A0=A0 =A0=A0=A0 <br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0=
 if(-1 !=3D meetConditions(value)) {<br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=
=A0=A0 context.write(value, one);<br>

=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 }<br>=A0=A0=A0 =A0=A0=A0 }<br>=A0=A0=A0 }<br>=
<br>public static class Reduce extends<br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 Red=
ucer&lt;Text, IntWritable, Text, IntWritable&gt; {<br><br>=A0=A0=A0 =A0=A0=
=A0 public void reduce(Text key, Iterable&lt;IntWritable&gt; values,<br>

=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 Context context) throws IOException=
, InterruptedException {<br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 context.write(key=
, new IntWritable(1));<br>=A0=A0=A0 =A0=A0=A0 }<br>=A0=A0=A0 }<br><br>The p=
roblem is that it prints the value 1 after each entry.<br>

<br>Hence my question. What is the best trivial implementation of the map a=
nd reduce function to address the use case above ?<br><br>Thank you greatly=
 for your help<br>
</blockquote></div><br></div>

--20cf300faf41bf085004ca8b27bc--