hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Ruch <rutschifen...@gmail.com>
Subject Re: Re: Filtering by value in Reducer
Date Mon, 11 May 2015 20:18:06 GMT
Hello,

sum and threshold are both Integers.
for the threshold variable I first add a new resource to the 
configuration - conf.addResource( ... );

later I get the threshold value from the configuration.

Code
#####################################

private int threshold;

public void setup( Context context ) {

           Configuration conf = context.getConfiguration();
           threshold = conf.getInt( "threshold", -1 );

}

#####################################


Best,
Peter


On 11.05.2015 19:26, Shahab Yunus wrote:
> What is the type of the threshold variable? sum I believe is a Java int.
>
> Regards,
> Shahab
>
> On Mon, May 11, 2015 at 1:08 PM, Peter Ruch <rutschifengga@gmail.com 
> <mailto:rutschifengga@gmail.com>> wrote:
>
>     Hi,
>
>     I am currently playing around with Hadoop and have some problems
>     when trying to filter in the Reducer.
>
>     I extended the WordCount v1.0 example from the 2.7 MapReduce
>     Tutorial with some additional functionality
>     and added the possibility to filter by the specific value of each
>     key - e.g. only output the key-value pairs where [[ value >
>     threshold ]].
>
>     Filtering Code in Reducer
>     #####################################
>
>     for (IntWritable val : values) {
>          sum += val.get();
>     }
>     if ( sum > threshold ) {
>          result.set(sum);
>          context.write(key, result);
>     }
>
>     #####################################
>
>     For threshold smaller any value the above code works as expected
>     and the output contains all key-value pairs.
>     If I increase the threshold to 1 some pairs are missing in the
>     output although the respective value would be larger than the
>     threshold.
>
>     I tried to work out the error myself, but I could not get it to
>     work as intended. I use the exact Tutorial setup with Oracle JDK 8
>     on a CentOS 7 machine.
>
>     As far as I understand the respective Iterable<...>  in the
>     Reducer already contains all the observed values for a specific key.
>     Why is it possible that I am missing some of these key-value pairs
>     then? It only fails in very few cases. The input file is pretty
>     large - 250 MB -
>     so I also tried to increase the memory for the mapping and
>     reduction steps but it did not help ( tried a lot of different
>     stuff without success )
>
>     Maybe someone already experienced similar problems / is more
>     experienced than I am.
>
>
>     Thank you,
>
>     Peter
>
>


Mime
View raw message