Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of sigurd.spieckermann@gmail.com
 designates 209.85.220.48 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CABVuHijqYX1xU0xLCEJiA+j3dJvVTiDg8GoW-3U0ptbbJRsCGw@mail.gmail.com>
References: 
 <CABVuHiix67S=i8T=uMjcNWhPbjXXdUZkwVghu0GPYjaJgUQFWQ@mail.gmail.com>
	<CABVuHijqYX1xU0xLCEJiA+j3dJvVTiDg8GoW-3U0ptbbJRsCGw@mail.gmail.com>
Date: Tue, 25 Sep 2012 11:57:55 +0200
Message-ID: 
 <CABVuHijz-54VtoSkFehqNOmAXAXMUV26p5XEqzLq-3cVoCbmZA@mail.gmail.com>
Subject: Re: Join-package combiner number of input and output records the same
From: Sigurd Spieckermann <sigurd.spieckermann@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=e89a8ff1c03e82598404ca83bde4

--e89a8ff1c03e82598404ca83bde4
Content-Type: text/plain; charset=ISO-8859-1

I think I have tracked down the problem to the point that each split only
contains one big key-value pair and a combiner is connected to a map task.
Please correct me if I'm wrong, but I assume each map task takes one split
and the combiner operates only on the key-value pairs within one split.
That's why the combiner has no effect in my case. Is there a way to combine
the mapper outputs of multiple splits before they are sent off to the
reducer?

2012/9/25 Sigurd Spieckermann <sigurd.spieckermann@gmail.com>

> Maybe one more note: the combiner and the reducer class are the same and
> in the reduce-phase the values get aggregated correctly. Why is this not
> happening in the combiner-phase?
>
>
> 2012/9/25 Sigurd Spieckermann <sigurd.spieckermann@gmail.com>
>
>> Hi guys,
>>
>> I'm experiencing a strange behavior when I use the Hadoop join-package.
>> After running a job the result statistics show that my combiner has an
>> input of 100 records and an output of 100 records. From the task I'm
>> running and the way it's implemented, I know that each key appears multiple
>> times and the values should be combinable before getting passed to the
>> reducer. I'm running my tests in pseudo-distributed mode with one or two
>> map tasks. From using the debugger, I noticed that each key-value pair is
>> processed by a combiner individually so there's actually no list passed
>> into the combiner that it could aggregate. Can anyone think of a reason
>> that causes this undesired behavior?
>>
>> Thanks
>> Sigurd
>>
>
>

--e89a8ff1c03e82598404ca83bde4
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

I think I have tracked down the problem to the point that each split only c=
ontains one big key-value pair and a combiner is connected to a map task. P=
lease correct me if I&#39;m wrong, but I assume each map task takes one spl=
it and the combiner operates only on the key-value pairs within one split. =
That&#39;s why the combiner has no effect in my case. Is there a way to com=
bine the mapper outputs of multiple splits before they are sent off to the =
reducer?<br>
<br><div class=3D"gmail_quote">2012/9/25 Sigurd Spieckermann <span dir=3D"l=
tr">&lt;<a href=3D"mailto:sigurd.spieckermann@gmail.com" target=3D"_blank">=
sigurd.spieckermann@gmail.com</a>&gt;</span><br><blockquote class=3D"gmail_=
quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1=
ex">
Maybe one more note: the combiner and the reducer class are the same and in=
 the reduce-phase the values get aggregated correctly. Why is this not happ=
ening in the combiner-phase?<div class=3D"HOEnZb"><div class=3D"h5"><br><br=
>
<div class=3D"gmail_quote">2012/9/25 Sigurd Spieckermann <span dir=3D"ltr">=
&lt;<a href=3D"mailto:sigurd.spieckermann@gmail.com" target=3D"_blank">sigu=
rd.spieckermann@gmail.com</a>&gt;</span><br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Hi guys,<br><br>I&#39;m experiencing a stran=
ge behavior when I use the Hadoop join-package. After running a job the res=
ult statistics show that my combiner has an input of 100 records and an out=
put of 100 records. From the task I&#39;m running and the way it&#39;s impl=
emented, I know that each key appears multiple times and the values should =
be combinable before getting passed to the reducer. I&#39;m running my test=
s in pseudo-distributed mode with one or two map tasks. From using the debu=
gger, I noticed that each key-value pair is processed by a combiner individ=
ually so there&#39;s actually no list passed into the combiner that it coul=
d aggregate. Can anyone think of a reason that causes this undesired behavi=
or?<br>


<br>Thanks<span><font color=3D"#888888"><br>Sigurd<br>
</font></span></blockquote></div><br>
</div></div></blockquote></div><br>

--e89a8ff1c03e82598404ca83bde4--