Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
MIME-Version: 1.0
In-Reply-To: <CAO2XNQ_y4xiMpdMFz=tR+j1LFDFU2PkbdSL4LxAMjz7Kz2oFew@mail.gmail.com>
References: <CAO2XNQ_y4xiMpdMFz=tR+j1LFDFU2PkbdSL4LxAMjz7Kz2oFew@mail.gmail.com>
From: Ravi Prakash <ravihadoop@gmail.com>
Date: Mon, 31 Jul 2017 13:21:36 -0700
Message-ID: <CAMs9kVir7=n3Hx=XMG+eKcvnSUC+z6ZdLUqcka1gYykH72-yuw@mail.gmail.com>
Subject: Re: Shuffle buffer size in presence of small partitions
To: Robert Schmidtke <ro.schmidtke@gmail.com>
Cc: "user@hadoop.apache.org user@hadoop.apache.org" <user@hadoop.apache.org>
Content-Type: multipart/alternative; boundary="001a11c0152419df760555a2c7cc"
archived-at: Mon, 31 Jul 2017 20:21:42 -0000

--001a11c0152419df760555a2c7cc
Content-Type: text/plain; charset="UTF-8"

Hi Robert!

I'm sorry I do not have a Windows box and probably don't understand the
shuffle process well enough. Could you please create a JIRA in the
mapreduce proect if you would like this fixed upstream?
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=116&projectKey=MAPREDUCE

Thanks
Ravi

On Mon, Jul 31, 2017 at 6:36 AM, Robert Schmidtke <ro.schmidtke@gmail.com>
wrote:

> Hi all,
>
> I just ran into an issue, which likely resulted from my not very
> intelligent configuration, but nonetheless I'd like to share this with the
> community. This is all on Hadoop 2.7.3.
>
> In my setup, each reducer roughly fetched 65K from each mapper's spill
> file. I disabled transferTo during shuffle, because I wanted to have a look
> at the file system statistics, which miss mmap calls, which is what
> transferTo sometimes defaults to. I left the shuffle buffer size at 128K
> (not knowing about the parameter at the time). This had the effect that I
> observed roughly 100% more data being read during shuffle, since 128K were
> read for each 65K needed.
>
> I added a quick fix to Hadoop which chooses the minimum of the partition
> size and the shuffle buffer size: https://github.com/
> apache/hadoop/compare/branch-2.7.3...robert-schmidtke:
> adaptive-shuffle-buffer
> Benchmarking this version against transferTo.allowed=true yields the same
> runtime and roughly 10% more reads in YARN during the shuffle phase
> (compared to previous 100%).
> Maybe this is something that should be added to Hadoop? Or do users have
> to be more clever about their job configurations? I'd be happy to open a PR
> if this is deemed useful.
>
> Anyway, thanks for the attention!
>
> Cheers
> Robert
>
> --
> My GPG Key ID: 336E2680
>

--001a11c0152419df760555a2c7cc
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div><div>Hi Robert!<br><br></div>I&#39;m sorry I do =
not have a Windows box and probably don&#39;t understand the shuffle proces=
s well enough. Could you please create a JIRA in the mapreduce proect if yo=
u would like this fixed upstream? <br><a href=3D"https://issues.apache.org/=
jira/secure/RapidBoard.jspa?rapidView=3D116&amp;projectKey=3DMAPREDUCE">htt=
ps://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=3D116&amp;proj=
ectKey=3DMAPREDUCE</a><br><br></div>Thanks<br></div>Ravi<br></div><div clas=
s=3D"gmail_extra"><br><div class=3D"gmail_quote">On Mon, Jul 31, 2017 at 6:=
36 AM, Robert Schmidtke <span dir=3D"ltr">&lt;<a href=3D"mailto:ro.schmidtk=
e@gmail.com" target=3D"_blank">ro.schmidtke@gmail.com</a>&gt;</span> wrote:=
<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-lef=
t:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">Hi all,<div><br></div><=
div>I just ran into an issue, which likely resulted from my not very intell=
igent configuration, but nonetheless I&#39;d like to share this with the co=
mmunity. This is all on Hadoop 2.7.3.</div><div><br></div><div>In my setup,=
 each reducer roughly fetched 65K from each mapper&#39;s spill file. I disa=
bled transferTo during shuffle, because I wanted to have a look at the file=
 system statistics, which miss mmap calls, which is what transferTo sometim=
es defaults to. I left the shuffle buffer size at 128K (not knowing about t=
he parameter at the time). This had the effect that I observed roughly 100%=
 more data being read during shuffle, since 128K were read for each 65K nee=
ded.</div><div><br></div><div>I added a quick fix to Hadoop which chooses t=
he minimum of the partition size and the shuffle buffer size:=C2=A0<a href=
=3D"https://github.com/apache/hadoop/compare/branch-2.7.3...robert-schmidtk=
e:adaptive-shuffle-buffer" target=3D"_blank">https://github.com/<wbr>apache=
/hadoop/compare/branch-<wbr>2.7.3...robert-schmidtke:<wbr>adaptive-shuffle-=
buffer</a></div><div>Benchmarking this version against transferTo.allowed=
=3Dtrue yields the same runtime and roughly 10% more reads in YARN during t=
he shuffle phase (compared to previous 100%).</div><div>Maybe this is somet=
hing that should be added to Hadoop? Or do users have to be more clever abo=
ut their job configurations? I&#39;d be happy to open a PR if this is deeme=
d useful.</div><div><br></div><div>Anyway, thanks for the attention!</div><=
div><br></div><div>Cheers</div><span class=3D"HOEnZb"><font color=3D"#88888=
8"><div>Robert</div><div><div><br></div>-- <br><div class=3D"m_-10108326437=
49734214gmail_signature"><div dir=3D"ltr">My GPG Key ID: 336E2680<br></div>=
</div>
</div></font></span></div>
</blockquote></div><br></div>

--001a11c0152419df760555a2c7cc--