hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From unmesha sreeveni <unmeshab...@gmail.com>
Subject cleanup() in hadoop results in aggregation of whole file/not
Date Sat, 28 Feb 2015 05:24:27 GMT
‚ÄčI am having an input file, which contains last column as class label
7.4 0.29 0.5 1.8 0.042 35 127 0.9937 3.45 0.5 10.2 7 1
10 0.41 0.45 6.2 0.071 6 14 0.99702 3.21 0.49 11.8 7 -1
7.8 0.26 0.27 1.9 0.051 52 195 0.9928 3.23 0.5 10.9 6 1
6.9 0.32 0.3 1.8 0.036 28 117 0.99269 3.24 0.48 11 6 1
...................
I am trying to get the unique class label of the whole file. Inorder to get
the same I am doing the below code.

*public class MyMapper extends Mapper<LongWritable, Text, IntWritable,
FourvalueWritable>{*
*    Set<String> uniqueLabel = new HashSet();*

*    public void map(LongWritable key,Text value,Context context){*
*        //Last column of input is classlabel.*
*         Vector<String> cls = CustomParam.customLabel(line, delimiter,
classindex); // *
*         uniqueLabel.add(cls.get(0));*
*    }*
*    public void cleanup(Context context) throws IOException{*
*        //find min and max label*
*
 context.getCounter(UpdateCost.MINLABEL).setValue(Long.valueOf(minLabel));*
*
 context.getCounter(UpdateCost.MAXLABEL).setValue(Long.valueOf(maxLabel));*
*}*
Cleanup is only executed for once.

And after each map whether "Set uniqueLabel = new HashSet();" the set get
updated,Hope that set get updated for each map?
Hope I am able to get the uniqueLabel of the whole file in cleanup
Please suggest if I am wrong.

Thanks in advance.

Mime
View raw message