hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shahab Yunus <shahab.yu...@gmail.com>
Subject Re: Re: Re: Filtering by value in Reducer
Date Tue, 12 May 2015 12:57:51 GMT
Have you tried explicitly printing or logging in you reducer around the
code that compares and then outputs the values? Maybe that will give you a
clue that what is happening? Debug the threshold value that you get in the
reducer and whether that is what you have set or not (in case of when you
set it to greater than -1)?

You can also try to use compare method for comparing IntWritables though I
doubt that would make any difference.

Shahab
On May 12, 2015 8:17 AM, "Peter Ruch" <rutschifengga@gmail.com> wrote:

>  Hi,
>
> I already skimmed through the logs but I could not find anything special.
>
> I am just really confused why I am having this problem.
>
> If the Iterable<...> for a specific key contains all of the observed
> values - and it seems to do so
> otherwise the program wouldn't work correctly in the standard case with [[
> threshold = -1 ]] -
> it should also work when I only write the key-value pairs to the output
> file that suffice the condition [[ sum > threshold ]].
>
> Did I miss something? Maybe I have to handle these cases in a specific
> way, but I did not find anything about that online.
>
>
> Thank you for your help,
>
> Peter
>
>
>
> On 12.05.2015 12:35, Drake민영근 wrote:
>
> Hi, Peter
>
>  The missing records, they are just gone without no logs? How about your
> reduce tasks logs?
>
>  Thanks
>
>   Drake 민영근 Ph.D
> kt NexR
>
> On Tue, May 12, 2015 at 5:18 AM, Peter Ruch <rutschifengga@gmail.com>
> wrote:
>
>>  Hello,
>>
>> sum and threshold are both Integers.
>> for the threshold variable I first add a new resource to the
>> configuration - conf.addResource( ... );
>>
>> later I get the threshold value from the configuration.
>>
>> Code
>> #####################################
>>
>> private int threshold;
>>
>> public void setup( Context context ) {
>>
>>           Configuration conf = context.getConfiguration();
>>           threshold = conf.getInt( "threshold", -1 );
>>
>> }
>>
>> #####################################
>>
>>
>> Best,
>> Peter
>>
>>
>>
>> On 11.05.2015 19:26, Shahab Yunus wrote:
>>
>> What is the type of the threshold variable? sum I believe is a Java int.
>>
>>  Regards,
>> Shahab
>>
>> On Mon, May 11, 2015 at 1:08 PM, Peter Ruch <rutschifengga@gmail.com>
>> wrote:
>>
>>>   Hi,
>>>
>>>  I am currently playing around with Hadoop and have some problems when
>>> trying to filter in the Reducer.
>>>
>>> I extended the WordCount v1.0 example from the 2.7 MapReduce Tutorial
>>> with some additional functionality
>>> and added the possibility to filter by the specific value of each key -
>>> e.g. only output the key-value pairs where [[ value > threshold ]].
>>>
>>>  Filtering Code in Reducer
>>>  #####################################
>>>
>>>  for (IntWritable val : values) {
>>>      sum += val.get();
>>> }
>>> if ( sum > threshold ) {
>>>      result.set(sum);
>>>      context.write(key, result);
>>> }
>>>
>>> #####################################
>>>
>>>  For threshold smaller any value the above code works as expected and
>>> the output contains all key-value pairs.
>>>  If I increase the threshold to 1 some pairs are missing in the output
>>> although the respective value would be larger than the threshold.
>>>
>>>  I tried to work out the error myself, but I could not get it to work
>>> as intended. I use the exact Tutorial setup with Oracle JDK 8
>>>  on a CentOS 7 machine.
>>>
>>>  As far as I understand the respective Iterable<...>  in the Reducer
>>> already contains all the observed values for a specific key.
>>>  Why is it possible that I am missing some of these key-value pairs
>>> then? It only fails in very few cases. The input file is pretty large - 250
>>> MB -
>>>  so I also tried to increase the memory for the mapping and reduction
>>> steps but it did not help ( tried a lot of different stuff without success )
>>>
>>>  Maybe someone already experienced similar problems / is more
>>> experienced than I am.
>>>
>>>
>>>  Thank you,
>>>
>>>  Peter
>>>
>>
>>
>>
>
>

Mime
View raw message