hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Ruch <rutschifen...@gmail.com>
Subject Re: Re: Re: Re: Filtering by value in Reducer
Date Tue, 12 May 2015 14:23:41 GMT
Hi,

No, I did not create any custom logs, I was only looking through the 
"standard" logs.
I just started out with Hadoop and did not think of explicitly logging 
that part of the code,
as I thought that I am simply missing a small detail that someone of you 
might spot.

But I will definitely look into the custom logging and post my findings.

@ Shahab and Drake: Thank you very much for your help.


Best,
Peter


On 12.05.2015 14:57, Shahab Yunus wrote:
>
> Have you tried explicitly printing or logging in you reducer around 
> the code that compares and then outputs the values? Maybe that will 
> give you a clue that what is happening? Debug the threshold value that 
> you get in the reducer and whether that is what you have set or not 
> (in case of when you set it to greater than -1)?
>
> You can also try to use compare method for comparing IntWritables 
> though I doubt that would make any difference.
>
> Shahab
>
> On May 12, 2015 8:17 AM, "Peter Ruch" <rutschifengga@gmail.com 
> <mailto:rutschifengga@gmail.com>> wrote:
>
>     Hi,
>
>     I already skimmed through the logs but I could not find anything
>     special.
>
>     I am just really confused why I am having this problem.
>
>     If the Iterable<...> for a specific key contains all of the
>     observed values - and it seems to do so
>     otherwise the program wouldn't work correctly in the standard case
>     with [[ threshold = -1 ]] -
>     it should also work when I only write the key-value pairs to the
>     output file that suffice the condition [[ sum > threshold ]].
>
>     Did I miss something? Maybe I have to handle these cases in a
>     specific way, but I did not find anything about that online.
>
>
>     Thank you for your help,
>
>     Peter
>
>
>
>     On 12.05.2015 12:35, Drake민영근 wrote:
>>     Hi, Peter
>>
>>     The missing records, they are just gone without no logs? How
>>     about your reduce tasks logs?
>>
>>     Thanks
>>
>>     Drake 민영근 Ph.D
>>     kt NexR
>>
>>     On Tue, May 12, 2015 at 5:18 AM, Peter Ruch
>>     <rutschifengga@gmail.com <mailto:rutschifengga@gmail.com>> wrote:
>>
>>         Hello,
>>
>>         sum and threshold are both Integers.
>>         for the threshold variable I first add a new resource to the
>>         configuration - conf.addResource( ... );
>>
>>         later I get the threshold value from the configuration.
>>
>>         Code
>>         #####################################
>>
>>         private int threshold;
>>
>>         public void setup( Context context ) {
>>
>>                   Configuration conf = context.getConfiguration();
>>                   threshold = conf.getInt( "threshold", -1 );
>>
>>         }
>>
>>         #####################################
>>
>>
>>         Best,
>>         Peter
>>
>>
>>
>>         On 11.05.2015 19:26, Shahab Yunus wrote:
>>>         What is the type of the threshold variable? sum I believe is
>>>         a Java int.
>>>
>>>         Regards,
>>>         Shahab
>>>
>>>         On Mon, May 11, 2015 at 1:08 PM, Peter Ruch
>>>         <rutschifengga@gmail.com <mailto:rutschifengga@gmail.com>>
>>>         wrote:
>>>
>>>             Hi,
>>>
>>>             I am currently playing around with Hadoop and have some
>>>             problems when trying to filter in the Reducer.
>>>
>>>             I extended the WordCount v1.0 example from the 2.7
>>>             MapReduce Tutorial with some additional functionality
>>>             and added the possibility to filter by the specific
>>>             value of each key - e.g. only output the key-value pairs
>>>             where [[ value > threshold ]].
>>>
>>>             Filtering Code in Reducer
>>>             #####################################
>>>
>>>             for (IntWritable val : values) {
>>>                  sum += val.get();
>>>             }
>>>             if ( sum > threshold ) {
>>>                  result.set(sum);
>>>                  context.write(key, result);
>>>             }
>>>
>>>             #####################################
>>>
>>>             For threshold smaller any value the above code works as
>>>             expected and the output contains all key-value pairs.
>>>             If I increase the threshold to 1 some pairs are missing
>>>             in the output although the respective value would be
>>>             larger than the threshold.
>>>
>>>             I tried to work out the error myself, but I could not
>>>             get it to work as intended. I use the exact Tutorial
>>>             setup with Oracle JDK 8
>>>             on a CentOS 7 machine.
>>>
>>>             As far as I understand the respective Iterable<...>  in
>>>             the Reducer already contains all the observed values for
>>>             a specific key.
>>>             Why is it possible that I am missing some of these
>>>             key-value pairs then? It only fails in very few cases.
>>>             The input file is pretty large - 250 MB -
>>>             so I also tried to increase the memory for the mapping
>>>             and reduction steps but it did not help ( tried a lot of
>>>             different stuff without success )
>>>
>>>             Maybe someone already experienced similar problems / is
>>>             more experienced than I am.
>>>
>>>
>>>             Thank you,
>>>
>>>             Peter
>>>
>>>
>>
>>
>


Mime
View raw message