Mailing-List: contact user-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hive.apache.org
Received-SPF: pass (athena.apache.org: domain of edlinuxguru@gmail.com
 designates 74.125.82.178 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAD7rktfLRyhXqoQV9mJgvREu1hNtx=-0W6RPxFRSQXPwdbtVZw@mail.gmail.com>
References: 
 <CAD7rkteo+fQSoabbgk8zjt6B3yJvYvVeYQD-EYe=v_OXrUGV8w@mail.gmail.com>
	<CAD7rkte-iTYey9WXE1wepe_fuW9tCzwvQ9hUvgAezByomc9rQg@mail.gmail.com>
	<CACQ46vEx5uMPpYq_KmyH7DT=6Z=qZut0MK8m3rY--5N5UO9DzA@mail.gmail.com>
	<CAENxBwwZf=JYWei2oHxaA1vbYEdY1_gFcD0JPfZ-vsEMe5cKvg@mail.gmail.com>
	<CAD7rktcyqJF-3pVbvaRK-uyVF+6+hmku8YR1K32hmx2y1JVDMg@mail.gmail.com>
	<CAENxBwyZkcTbKqtzTZesmhEa3_d1X_LcrD+ZaC65dbq0U4Z+TA@mail.gmail.com>
	<CAD7rktfLRyhXqoQV9mJgvREu1hNtx=-0W6RPxFRSQXPwdbtVZw@mail.gmail.com>
Date: Thu, 10 Jul 2014 22:28:05 -0400
Message-ID: 
 <CAENxBwzWEUZtqMbF_n44nYyuwBr2npinWxKXK06QseKw0osixA@mail.gmail.com>
Subject: Re: Hive UDF performance issue
From: Edward Capriolo <edlinuxguru@gmail.com>
To: "user@hive.apache.org" <user@hive.apache.org>
Content-Type: multipart/alternative; boundary=089e013d1d9eff058f04fde1b0d1

--089e013d1d9eff058f04fde1b0d1
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

The "small" table can be any size. You want the small table to be
/path/to/table/b here because that will result in more parallelism. There
is a ticket on hive theta join that you might want to look at.


On Thu, Jul 10, 2014 at 10:23 PM, Malligarjunan S <malligarjunan@gmail.com>
wrote:

> Hello Edwards,
>
> Thank you very much for the update.
> What size you mean is small table. In our case the small table will have
> minimum of 1 million records.
> Can we use this UDTF? how much time improvement will be there?
>
> Appreciate your help!
> Thanks and Regards
> SankarS
>
>
> On Thu, Jul 10, 2014 at 11:26 PM, Edward Capriolo <edlinuxguru@gmail.com>
> wrote:
>
>> There is no magic. Hopefully one table is smaller then the other. You
>> could make a UDTF to do something like this MR job is doing
>>
>> Make a mapper that runs over table A.
>> InputFormat.setInputPath("/path/to/table/a")
>>
>> Then inside the mapper
>>
>> private Conf c
>> setup(Conf c){
>>   this.c =3D c
>> }
>> public void map(Text key, Text value, Collector c){
>>   FileSystem fs =3D Filesystem.get(c);
>>   file f =3Dfs.open("/path/to/table/b")
>>   for (line in f){
>>     c.collect( value + line);
>>   }
>> }
>>
>>
>>
>> On Thu, Jul 10, 2014 at 12:56 PM, Malligarjunan S <
>> malligarjunan@gmail.com> wrote:
>>
>>> Hello Edward,
>>>
>>> Thank you very much for helping me.
>>> I am new to hive.  Could you please provide the sample map reduce job?
>>>
>>> Regards,
>>> Sankar S
>>>
>>>
>>>
>>>
>>> On Thu, Jul 10, 2014 at 8:19 AM, Edward Capriolo <edlinuxguru@gmail.com=
>
>>> wrote:
>>>
>>>> Hive cross product stinks . I have a map reduce job that will do it
>>>>
>>>>
>>>> On Wednesday, July 9, 2014, Navis=EB=A5=98=EC=8A=B9=EC=9A=B0 <navis.ry=
u@nexr.com> wrote:
>>>>
>>>>> Yes, 2M x 1M makes 2T pairing in single reducer.
>>>>>
>>>>> Thanks,
>>>>> Navis
>>>>>
>>>>>
>>>>> 2014-07-10 1:50 GMT+09:00 Malligarjunan S <malligarjunan@gmail.com>:
>>>>>
>>>>>> Hello All,
>>>>>> Is that the expected behavior from hive to take so much of time?
>>>>>>
>>>>>>
>>>>>> Thanks and Regards,
>>>>>> Sankar S
>>>>>>
>>>>>>
>>>>>> On Tue, Jul 8, 2014 at 11:23 PM, Malligarjunan S <
>>>>>> malligarjunan@gmail.com> wrote:
>>>>>>
>>>>>>> Hello All,
>>>>>>>
>>>>>>> Can any one help me to answer to my question posted on Stackoverflo=
w?
>>>>>>>
>>>>>>> http://stackoverflow.com/questions/24416373/hive-udf-performance-to=
o-slow
>>>>>>> It is pretty urgent. Please help me.
>>>>>>>
>>>>>>> Thanks and Regards,
>>>>>>> Sankar S.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Sorry this was sent from mobile. Will do less grammar and spell check
>>>> than usual.
>>>>
>>>
>>>
>>
>

--089e013d1d9eff058f04fde1b0d1
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">The &quot;small&quot; table can be any size. You want the =
small table to be /path/to/table/b here because that will result in more pa=
rallelism. There is a ticket on hive theta join that you might want to look=
 at.<br>
</div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On Thu,=
 Jul 10, 2014 at 10:23 PM, Malligarjunan S <span dir=3D"ltr">&lt;<a href=3D=
"mailto:malligarjunan@gmail.com" target=3D"_blank">malligarjunan@gmail.com<=
/a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div><div><div><div><div><d=
iv>Hello Edwards,<br><br></div>Thank you very much for the update. <br></di=
v>
What size you mean is small table. In our case the small table will have mi=
nimum of 1 million records.<br>
</div>Can we use this UDTF? how much time improvement will be there?<br><br=
></div>Appreciate your help!<br></div>Thanks and Regards<br></div>SankarS<b=
r></div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On Th=
u, Jul 10, 2014 at 11:26 PM, Edward Capriolo <span dir=3D"ltr">&lt;<a href=
=3D"mailto:edlinuxguru@gmail.com" target=3D"_blank">edlinuxguru@gmail.com</=
a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div><div><div><div><div><d=
iv><div><div>There is no magic. Hopefully one table is smaller then the oth=
er. You could make a UDTF to do something like this MR job is doing<br>

<br></div>Make a mapper that runs over table A.<br>
</div>InputFormat.setInputPath(&quot;/path/to/table/a&quot;)<br><br></div>T=
hen inside the mapper<br><br></div><div>private Conf c<br></div><div>setup(=
Conf c){<br></div><div>=C2=A0 this.c =3D c<br></div><div>}<br></div>public =
void map(Text key, Text value, Collector c){<br>


</div>=C2=A0 FileSystem fs =3D Filesystem.get(c);<br></div>=C2=A0 file f =
=3Dfs.open(&quot;/path/to/table/b&quot;)<br></div>=C2=A0 for (line in f){<b=
r></div>=C2=A0=C2=A0=C2=A0 c.collect( value + line);<br><div>=C2=A0 }<br><d=
iv><div><div>}<br><div><div><div><div>


<div><br></div></div></div></div></div></div></div></div></div></div><div><=
div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On Thu, J=
ul 10, 2014 at 12:56 PM, Malligarjunan S <span dir=3D"ltr">&lt;<a href=3D"m=
ailto:malligarjunan@gmail.com" target=3D"_blank">malligarjunan@gmail.com</a=
>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div><div><div>Hello Edward=
,<br><br></div>Thank you very much for helping me.<br>I am new to hive.=C2=
=A0 Could you please provide the sample map reduce job? <br>


<br></div>Regards,<br></div>Sankar S<br><div><div><br>
<br></div></div></div><div class=3D"gmail_extra"><br><br><div class=3D"gmai=
l_quote">On Thu, Jul 10, 2014 at 8:19 AM, Edward Capriolo <span dir=3D"ltr"=
>&lt;<a href=3D"mailto:edlinuxguru@gmail.com" target=3D"_blank">edlinuxguru=
@gmail.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Hive cross product stinks . I have a map red=
uce job that will do it<div><div><br><br>On Wednesday, July 9, 2014, Navis=
=EB=A5=98=EC=8A=B9=EC=9A=B0 &lt;<a href=3D"mailto:navis.ryu@nexr.com" targe=
t=3D"_blank">navis.ryu@nexr.com</a>&gt; wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
<div dir=3D"ltr">Yes, 2M x 1M makes 2T pairing in single reducer.<div><br><=
/div><div>Thanks,</div><div>Navis</div></div><div class=3D"gmail_extra"><br=
><br><div class=3D"gmail_quote">2014-07-10 1:50 GMT+09:00 Malligarjunan S <=
span dir=3D"ltr">&lt;<a>malligarjunan@gmail.com</a>&gt;</span>:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div><div>Hello All,<br>Is =
that the expected behavior from hive to take so much of time? <br></div><di=
v>


<br></div><div><br></div>Thanks and Regards,<br></div>Sankar S<br></div><di=
v><div><div class=3D"gmail_extra"><br><br>
<div class=3D"gmail_quote">On Tue, Jul 8, 2014 at 11:23 PM, Malligarjunan S=
 <span dir=3D"ltr">&lt;<a>malligarjunan@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">

<div dir=3D"ltr"><div><div><div>Hello All,<br><br></div>Can any one help me=
 to answer to my question posted on Stackoverflow?<br><a href=3D"http://sta=
ckoverflow.com/questions/24416373/hive-udf-performance-too-slow" target=3D"=
_blank">http://stackoverflow.com/questions/24416373/hive-udf-performance-to=
o-slow</a><br>


</div><div>It is pretty urgent. Please help me.<br></div><div><br></div>Tha=
nks and Regards,<br></div>Sankar S.<br></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</blockquote><br><span class=3D"HOEnZb"><font color=3D"#888888"><span><font=
 color=3D"#888888"><br></font></span></font></span></div></div><span class=
=3D"HOEnZb"><font color=3D"#888888"><span><font color=3D"#888888"><span><fo=
nt color=3D"#888888">-- <br>
Sorry this was sent from mobile. Will do less grammar and spell check than =
usual.<br>


</font></span></font></span></font></span></blockquote></div><span class=3D=
"HOEnZb"><font color=3D"#888888"><br></font></span></div><span class=3D"HOE=
nZb"><font color=3D"#888888">
</font></span></blockquote></div><span class=3D"HOEnZb"><font color=3D"#888=
888"><br></font></span></div>
</div></div></blockquote></div><br></div>
</blockquote></div><br></div>

--089e013d1d9eff058f04fde1b0d1--