hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shahab Yunus <shahab.yu...@gmail.com>
Subject Re: Filtering by value in Reducer
Date Mon, 11 May 2015 17:26:49 GMT
What is the type of the threshold variable? sum I believe is a Java int.

Regards,
Shahab

On Mon, May 11, 2015 at 1:08 PM, Peter Ruch <rutschifengga@gmail.com> wrote:

> Hi,
>
> I am currently playing around with Hadoop and have some problems when
> trying to filter in the Reducer.
>
> I extended the WordCount v1.0 example from the 2.7 MapReduce Tutorial with
> some additional functionality
> and added the possibility to filter by the specific value of each key -
> e.g. only output the key-value pairs where [[ value > threshold ]].
>
> Filtering Code in Reducer
> #####################################
>
> for (IntWritable val : values) {
>      sum += val.get();
> }
> if ( sum > threshold ) {
>      result.set(sum);
>      context.write(key, result);
> }
>
> #####################################
>
> For threshold smaller any value the above code works as expected and the
> output contains all key-value pairs.
> If I increase the threshold to 1 some pairs are missing in the output
> although the respective value would be larger than the threshold.
>
> I tried to work out the error myself, but I could not get it to work as
> intended. I use the exact Tutorial setup with Oracle JDK 8
> on a CentOS 7 machine.
>
> As far as I understand the respective Iterable<...>  in the Reducer
> already contains all the observed values for a specific key.
> Why is it possible that I am missing some of these key-value pairs then?
> It only fails in very few cases. The input file is pretty large - 250 MB -
> so I also tried to increase the memory for the mapping and reduction steps
> but it did not help ( tried a lot of different stuff without success )
>
> Maybe someone already experienced similar problems / is more experienced
> than I am.
>
>
> Thank you,
>
> Peter
>

Mime
View raw message