hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amogh Vasekar <am...@yahoo-inc.com>
Subject Re: Re: Re: Re: Re: map output not euqal to reduce input
Date Tue, 15 Dec 2009 06:59:14 GMT

>>how do you define 'consumed by reducer'
Trivially, as long as you have your values iterator go to the end, you should be just fine.
Sorry, haven’t worked with decision support per se, probably someone else can shed some
light on its quirks :)


On 12/11/09 7:38 PM, "Gang Luo" <lgpublic@yahoo.com.cn> wrote:

Thanks, Amogn.
I am not sure whether all the records mepper generate are consumed by reducer. But how do
you define 'consumed by reducer'? I can set a counter to see how many lines go to my map function,
but this is likely the same as reduce input # which is less than map output #.

I didn't use SkipBadRecords class. I think by default the feature is disabled. So, it should
have nothing to do with this.

I do my test using tables of TPC-DS. If I run my job on some 'toy tables' I make, the statistics
is correct.


----- 原始邮件 ----
发件人: Amogh Vasekar <amogh@yahoo-inc.com>
收件人: "common-user@hadoop.apache.org" <common-user@hadoop.apache.org>
发送日期: 2009/12/11 (周五) 2:55:12 上午
主   题: Re: Re: Re: Re:  map output not euqal to reduce input

The counters are updated as the records are *consumed*, for both mapper and reducer. Can you
confirm if all the values returned by your iterators are consumed on reduce side? Also, are
you having feature of skipping bad records switched on?


On 12/11/09 4:32 AM, "Gang Luo" <lgpublic@yahoo.com.cn> wrote:

In the mapper of this job, I get something I am interested in for each
line and then output all of them. So the number of map input records is
equal to the map output records. Actually, I am doing semi join in this
job. There is no failure during execution.


----- ԭʼ�ʼ� ----
�����ˣ� Todd Lipcon <todd@cloudera.com>
�ռ��ˣ� common-user@hadoop.apache.org
�������ڣ� 2009/12/10 (����) 4:43:52 ����
��   �⣺ Re: Re�� Re�� map output not euqal to reduce input

On Thu, Dec 10, 2009 at 1:15 PM, Gang Luo <lgpublic@yahoo.com.cn> wrote:
> Hi Todd,
> I didn't change the partitioner, just use the default one. Will the default partitioner
cause the lost of the records?
> -Gang

Do the maps output data nondeterministically? Did you experience any
task failures in the run of the job?




  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message