Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of dontariq@gmail.com designates
 209.85.220.182 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAAu13zH-dQJHwN=i_R_uyiqzKL_Z8F4L4AE1hbWfOSg1McCaUg@mail.gmail.com>
References: 
 <CAAu13zH-dQJHwN=i_R_uyiqzKL_Z8F4L4AE1hbWfOSg1McCaUg@mail.gmail.com>
From: Mohammad Tariq <dontariq@gmail.com>
Date: Wed, 1 May 2013 00:26:13 +0530
Message-ID: 
 <CAMVC6RO1nVKxdKjCAehXu_Dnhoz+b49HFr3fu_Ej1LnLB9=xHA@mail.gmail.com>
Subject: Re: partition as block?
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Content-Type: multipart/alternative; boundary=20cf307ca3e68e1a0404db98901b

--20cf307ca3e68e1a0404db98901b
Content-Type: text/plain; charset=ISO-8859-1

Hello Jay,

    What are you going to do in your custom InputFormat and partitioner?Is
your InputFormat is going to create larger splits which will overlap with
larger blocks?If that is the case, IMHO, then you are going to reduce the
no. of mappers thus reducing the parallelism. Also, much larger block size
will put extra overhead when it comes to disk I/O.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Wed, May 1, 2013 at 12:16 AM, Jay Vyas <jayunit100@gmail.com> wrote:

> Hi guys:
>
> Im wondering - if I'm running mapreduce jobs on a cluster with large block
> sizes - can i increase performance with either:
>
> 1) A custom FileInputFormat
>
>  2) A custom partitioner
>
> 3) -DnumReducers
>
> Clearly, (3) will be an issue due to the fact that it might overload tasks
> and network traffic... but maybe (1) or (2) will be a precise way to "use"
> partitions as a "poor mans" block.
>
> Just a thought - not sure if anyone has tried (1) or (2) before in order
> to simulate blocks and increase locality by utilizing the partition API.
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
>

--20cf307ca3e68e1a0404db98901b
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hello Jay,<div><br></div><div style>=A0 =A0 What are you g=
oing to do in your custom InputFormat and partitioner?Is your InputFormat i=
s going to create larger splits which will overlap with larger blocks?If th=
at is the case, IMHO, then you are going to reduce the no. of mappers thus =
reducing the parallelism. Also, much larger block size will put extra overh=
ead when it comes to disk I/O.</div>

</div><div class=3D"gmail_extra"><br clear=3D"all"><div><div dir=3D"ltr">Wa=
rm Regards,<div>Tariq</div><div><a href=3D"https://mtariq.jux.com/" target=
=3D"_blank">https://mtariq.jux.com/</a><br></div><div><a href=3D"http://clo=
udfront.blogspot.com" target=3D"_blank">cloudfront.blogspot.com</a><br>

</div></div></div>
<br><br><div class=3D"gmail_quote">On Wed, May 1, 2013 at 12:16 AM, Jay Vya=
s <span dir=3D"ltr">&lt;<a href=3D"mailto:jayunit100@gmail.com" target=3D"_=
blank">jayunit100@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"g=
mail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-l=
eft:1ex">

<div dir=3D"ltr"><div><div><div>Hi guys:<br><br>Im wondering - if I&#39;m r=
unning mapreduce jobs on a cluster with large block sizes - can i increase =
performance with either:<br><br></div>1) A custom FileInputFormat<br><br>

</div>
2) A custom partitioner <br><br></div>3) -DnumReducers<br clear=3D"all"><di=
v><div><div><div><br></div><div>Clearly, (3) will be an issue due to the fa=
ct that it might overload tasks and network traffic... but maybe (1) or (2)=
 will be a precise way to &quot;use&quot; partitions as a &quot;poor mans&q=
uot; block.=A0 <br>


<br></div><div>Just a thought - not sure if anyone has tried (1) or (2) bef=
ore in order to simulate blocks and increase locality by utilizing the part=
ition API.<span class=3D"HOEnZb"><font color=3D"#888888"><br><br></font></s=
pan></div>

<span class=3D"HOEnZb"><font color=3D"#888888"><div>-- <br>Jay Vyas<br><a h=
ref=3D"http://jayunit100.blogspot.com" target=3D"_blank">http://jayunit100.=
blogspot.com</a>
</div></font></span></div></div></div></div>
</blockquote></div><br></div>

--20cf307ca3e68e1a0404db98901b--