hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Ruch <rutschifen...@gmail.com>
Subject Re: Re: Re: Filtering by value in Reducer
Date Tue, 12 May 2015 12:15:54 GMT
Hi,

I already skimmed through the logs but I could not find anything special.

I am just really confused why I am having this problem.

If the Iterable<...> for a specific key contains all of the observed 
values - and it seems to do so
otherwise the program wouldn't work correctly in the standard case with 
[[ threshold = -1 ]] -
it should also work when I only write the key-value pairs to the output 
file that suffice the condition [[ sum > threshold ]].

Did I miss something? Maybe I have to handle these cases in a specific 
way, but I did not find anything about that online.


Thank you for your help,

Peter



On 12.05.2015 12:35, Drake민영근 wrote:
> Hi, Peter
>
> The missing records, they are just gone without no logs? How about 
> your reduce tasks logs?
>
> Thanks
>
> Drake 민영근 Ph.D
> kt NexR
>
> On Tue, May 12, 2015 at 5:18 AM, Peter Ruch <rutschifengga@gmail.com 
> <mailto:rutschifengga@gmail.com>> wrote:
>
>     Hello,
>
>     sum and threshold are both Integers.
>     for the threshold variable I first add a new resource to the
>     configuration - conf.addResource( ... );
>
>     later I get the threshold value from the configuration.
>
>     Code
>     #####################################
>
>     private int threshold;
>
>     public void setup( Context context ) {
>
>               Configuration conf = context.getConfiguration();
>               threshold = conf.getInt( "threshold", -1 );
>
>     }
>
>     #####################################
>
>
>     Best,
>     Peter
>
>
>
>     On 11.05.2015 19:26, Shahab Yunus wrote:
>>     What is the type of the threshold variable? sum I believe is a
>>     Java int.
>>
>>     Regards,
>>     Shahab
>>
>>     On Mon, May 11, 2015 at 1:08 PM, Peter Ruch
>>     <rutschifengga@gmail.com <mailto:rutschifengga@gmail.com>> wrote:
>>
>>         Hi,
>>
>>         I am currently playing around with Hadoop and have some
>>         problems when trying to filter in the Reducer.
>>
>>         I extended the WordCount v1.0 example from the 2.7 MapReduce
>>         Tutorial with some additional functionality
>>         and added the possibility to filter by the specific value of
>>         each key - e.g. only output the key-value pairs where [[
>>         value > threshold ]].
>>
>>         Filtering Code in Reducer
>>         #####################################
>>
>>         for (IntWritable val : values) {
>>              sum += val.get();
>>         }
>>         if ( sum > threshold ) {
>>              result.set(sum);
>>              context.write(key, result);
>>         }
>>
>>         #####################################
>>
>>         For threshold smaller any value the above code works as
>>         expected and the output contains all key-value pairs.
>>         If I increase the threshold to 1 some pairs are missing in
>>         the output although the respective value would be larger than
>>         the threshold.
>>
>>         I tried to work out the error myself, but I could not get it
>>         to work as intended. I use the exact Tutorial setup with
>>         Oracle JDK 8
>>         on a CentOS 7 machine.
>>
>>         As far as I understand the respective Iterable<...>  in the
>>         Reducer already contains all the observed values for a
>>         specific key.
>>         Why is it possible that I am missing some of these key-value
>>         pairs then? It only fails in very few cases. The input file
>>         is pretty large - 250 MB -
>>         so I also tried to increase the memory for the mapping and
>>         reduction steps but it did not help ( tried a lot of
>>         different stuff without success )
>>
>>         Maybe someone already experienced similar problems / is more
>>         experienced than I am.
>>
>>
>>         Thank you,
>>
>>         Peter
>>
>>
>
>


Mime
View raw message