Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 89B1D17CE5 for ; Sun, 1 Mar 2015 12:51:51 +0000 (UTC) Received: (qmail 25622 invoked by uid 500); 1 Mar 2015 12:51:45 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 25487 invoked by uid 500); 1 Mar 2015 12:51:45 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 25473 invoked by uid 99); 1 Mar 2015 12:51:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 01 Mar 2015 12:51:45 +0000 X-ASF-Spam-Status: No, hits=1.8 required=5.0 tests=HTML_FONT_FACE_BAD,HTML_MESSAGE,RCVD_IN_DNSWL_LOW X-Spam-Check-By: apache.org Received-SPF: error (nike.apache.org: local policy) Received: from [217.70.183.195] (HELO relay3-d.mail.gandi.net) (217.70.183.195) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 01 Mar 2015 12:51:17 +0000 Received: from mfilter41-d.gandi.net (mfilter41-d.gandi.net [217.70.178.173]) by relay3-d.mail.gandi.net (Postfix) with ESMTP id DF971A80AF for ; Sun, 1 Mar 2015 13:50:55 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at mfilter41-d.gandi.net Received: from relay3-d.mail.gandi.net ([217.70.183.195]) by mfilter41-d.gandi.net (mfilter41-d.gandi.net [10.0.15.180]) (amavisd-new, port 10024) with ESMTP id S8xWA1EJ-gfH for ; Sun, 1 Mar 2015 13:50:54 +0100 (CET) X-Originating-IP: 78.228.212.43 Received: from [192.168.0.11] (mar92-17-78-228-212-43.fbx.proxad.net [78.228.212.43]) (Authenticated sender: hadoop@ulul.org) by relay3-d.mail.gandi.net (Postfix) with ESMTPSA id 35D96A80C2 for ; Sun, 1 Mar 2015 13:50:53 +0100 (CET) Message-ID: <54F30B2D.8020203@ulul.org> Date: Sun, 01 Mar 2015 13:50:53 +0100 From: Ulul User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: user@hadoop.apache.org Subject: Re: cleanup() in hadoop results in aggregation of whole file/not References: <54F30905.5000102@ulul.org> In-Reply-To: <54F30905.5000102@ulul.org> Content-Type: multipart/alternative; boundary="------------040902040809000704090303" X-Virus-Checked: Checked by ClamAV on apache.org This is a multi-part message in MIME format. --------------040902040809000704090303 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Edit : instead of buffering in Hash and then emitting at cleanup you can=20 use a combiner. Likely slower but easier to code if speed is not your=20 main concern Le 01/03/2015 13:41, Ulul a =C3=A9crit : > Hi > > I probably misunderstood your question because my impression is that=20 > it's typically a job for a reducer. Emit "local" min and max with two=20 > keys from each mapper and you will easily get gobal min and max in redu= cer > > Ulul > Le 28/02/2015 14:10, Shahab Yunus a =C3=A9crit : >> As far as I understand cleanup is called per task. In your case I.e.=20 >> per map task. To get an overall count or measure, you need to=20 >> aggregate it yourself after the job is done. >> >> One way to do that is to use counters and then merge them=20 >> programmatically at the end of the job. >> >> Regards, >> Shahab >> >> On Saturday, February 28, 2015, unmesha sreeveni=20 >> > wrote: >> >> >> =E2=80=8BI am having an input file, which contains last column as = class label >> 7.4 0.29 0.5 1.8 0.042 35 127 0.9937 3.45 0.5 10.2 7 1 >> 10 0.41 0.45 6.2 0.071 6 14 0.99702 3.21 0.49 11.8 7 -1 >> 7.8 0.26 0.27 1.9 0.051 52 195 0.9928 3.23 0.5 10.9 6 1 >> 6.9 0.32 0.3 1.8 0.036 28 117 0.99269 3.24 0.48 11 6 1 >> ................... >> I am trying to get the unique class label of the whole file. >> Inorder to get the same I am doing the below code. >> >> /public class MyMapper extends Mapper> IntWritable, FourvalueWritable>{/ >> / Set uniqueLabel =3D new HashSet();/ >> / >> / >> / public void map(LongWritable key,Text value,Context context){= / >> / //Last column of input is classlabel./ >> / Vector cls =3D CustomParam.customLabel(line, >> delimiter, classindex); // / >> / uniqueLabel.add(cls.get(0));/ >> / }/ >> / public void cleanup(Context context) throws IOException{/ >> / //find min and max label/ >> / context.getCounter(UpdateCost.MINLABEL).setValue(Long.valueOf(mi= nLabel));/ >> / context.getCounter(UpdateCost.MAXLABEL).setValue(Long.valueOf(ma= xLabel));/ >> /}/ >> Cleanup is only executed for once. >> >> And after each map whether "Set uniqueLabel =3D new HashSet();" th= e >> set get updated,Hope that set get updated for each map? >> Hope I am able to get the uniqueLabel of the whole file in cleanup >> Please suggest if I am wrong. >> >> Thanks in advance. >> >> > --------------040902040809000704090303 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable Edit : instead of buffering in Hash and then emitting at cleanup you can use a combiner. Likely slower but easier to code if speed is not your main concern

Le 01/03/2015 13:41, Ulul a =C3=A9crit= =C2=A0:
Hi

I probably misunderstood your question because my impression is that it's typically a job for a reducer. Emit "local" min and max with two keys from each mapper and you will easily get gobal min and max in reducer

Ulul
Le 28/02/2015 14:10, Shahab Yunus a =C3=A9crit=C2=A0:
As far as I understand cleanup is called per task. In your case I.e. per=C2=A0map task. To get an overall count or measure, you need to aggregate it=C2=A0yourself after the job is done.

One way to do that is to use counters and then merge them programmatically at the end of the job.

Regards,=C2=A0
Shahab

On Saturday, February 28, 2015, unmesha sreeveni <unmeshabiju@gmail.com> wrote:

=E2=80=8BI am having an input file, which contains last column as class label
7.4 0.29 0.5 1.8 0.042 35 127 0.9937 3.45 0.5 10.2 7 1
10 0.41 0.45 6.2 0.071 6 14 0.99702 3.21 0.49 11.8 7 -1
7.8 0.26 0.27 1.9 0.051 52 195 0.9928 3.23 0.5 10.9 6 1
6.9 0.32 0.3 1.8 0.036 28 117 0.99269 3.24 0.48 11 6 1
...................
I am trying to get the unique class label of the whole file. Inorder to get the same I am doing the below code.

public class MyMapper extends Mapper<LongWritable, Text, IntWritable, FourvalueWritable>{
=C2=A0 =C2=A0 Set<String> unique= Label =3D new HashSet();

=C2=A0 =C2=A0 public void map(LongWrit= able key,Text value,Context context){
=C2=A0 =C2=A0 =C2=A0 =C2=A0 //Last col= umn of input is classlabel.
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Vect= or<String> cls =3D CustomParam.customLabel(line, delimiter, classindex); //=C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0uniq= ueLabel.add(cls.get(0));
=C2=A0 =C2=A0 }
=C2=A0 =C2=A0 public void cleanup(Cont= ext context) throws IOException{
=C2=A0 =C2=A0 =C2=A0 =C2=A0 //find min= and max label
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0context.getCounter(UpdateCost.MINLABEL).setValue(Long.valueOf(minLa= bel));
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0context.getCounter(UpdateCost.MAXLABEL).setValue(Long.valueOf(maxLa= bel));
}
Cleanup is only executed for once.=

And after each map whether "Set uniqueLabel =3D new HashSet();" the set get updated,Hope that set get updated for each map?
Hope I am able to get the uniqueLabel of the whole file in cleanup
Please suggest if I am wrong.

Thanks in advance.




--------------040902040809000704090303--