Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DC88B10187 for ; Fri, 11 Jul 2014 02:28:32 +0000 (UTC) Received: (qmail 20061 invoked by uid 500); 11 Jul 2014 02:28:31 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 19992 invoked by uid 500); 11 Jul 2014 02:28:31 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 19982 invoked by uid 99); 11 Jul 2014 02:28:31 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Jul 2014 02:28:31 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of edlinuxguru@gmail.com designates 74.125.82.178 as permitted sender) Received: from [74.125.82.178] (HELO mail-we0-f178.google.com) (74.125.82.178) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Jul 2014 02:28:27 +0000 Received: by mail-we0-f178.google.com with SMTP id x48so381894wes.23 for ; Thu, 10 Jul 2014 19:28:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=hmRKswnjjnKo+t5Kv3FO3OBUfD3LOP98pP84aYtI9Wo=; b=wPetSVzaE4LZbfF6uBl0N8+qL///ppPYpAW41rZvkrZGTnahCNefRzqaoJ2d6Qv30L l/pXFZa/fO3fL8UxIYjt4NEdfap4kpDcdwhzxmAMxFmQV8KXxwWIwVP/pZYM+XqrIBPs UeN4UbOOkCs1Lo7KR2JDdLkiD07nxihSwz1vE7HKbY9ewqlYkOz+eNtiZx6j2q3Bu/Bd i/m1OgvnTuyJQ52Q+WDMc08Yfd0gerqP3ujdUxNXkGFJ9emMXES6RkB8pX+XgkqEv8H3 33JGEbyWnR+uWPWlE2AnuRNCLOe5McJFVh2xjUJ0iwnm9wqY243hGCvoVIQIxkoo+RB0 qxBw== MIME-Version: 1.0 X-Received: by 10.194.177.168 with SMTP id cr8mr8187724wjc.134.1405045685812; Thu, 10 Jul 2014 19:28:05 -0700 (PDT) Received: by 10.194.88.100 with HTTP; Thu, 10 Jul 2014 19:28:05 -0700 (PDT) In-Reply-To: References: Date: Thu, 10 Jul 2014 22:28:05 -0400 Message-ID: Subject: Re: Hive UDF performance issue From: Edward Capriolo To: "user@hive.apache.org" Content-Type: multipart/alternative; boundary=089e013d1d9eff058f04fde1b0d1 X-Virus-Checked: Checked by ClamAV on apache.org --089e013d1d9eff058f04fde1b0d1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable The "small" table can be any size. You want the small table to be /path/to/table/b here because that will result in more parallelism. There is a ticket on hive theta join that you might want to look at. On Thu, Jul 10, 2014 at 10:23 PM, Malligarjunan S wrote: > Hello Edwards, > > Thank you very much for the update. > What size you mean is small table. In our case the small table will have > minimum of 1 million records. > Can we use this UDTF? how much time improvement will be there? > > Appreciate your help! > Thanks and Regards > SankarS > > > On Thu, Jul 10, 2014 at 11:26 PM, Edward Capriolo > wrote: > >> There is no magic. Hopefully one table is smaller then the other. You >> could make a UDTF to do something like this MR job is doing >> >> Make a mapper that runs over table A. >> InputFormat.setInputPath("/path/to/table/a") >> >> Then inside the mapper >> >> private Conf c >> setup(Conf c){ >> this.c =3D c >> } >> public void map(Text key, Text value, Collector c){ >> FileSystem fs =3D Filesystem.get(c); >> file f =3Dfs.open("/path/to/table/b") >> for (line in f){ >> c.collect( value + line); >> } >> } >> >> >> >> On Thu, Jul 10, 2014 at 12:56 PM, Malligarjunan S < >> malligarjunan@gmail.com> wrote: >> >>> Hello Edward, >>> >>> Thank you very much for helping me. >>> I am new to hive. Could you please provide the sample map reduce job? >>> >>> Regards, >>> Sankar S >>> >>> >>> >>> >>> On Thu, Jul 10, 2014 at 8:19 AM, Edward Capriolo >>> wrote: >>> >>>> Hive cross product stinks . I have a map reduce job that will do it >>>> >>>> >>>> On Wednesday, July 9, 2014, Navis=EB=A5=98=EC=8A=B9=EC=9A=B0 wrote: >>>> >>>>> Yes, 2M x 1M makes 2T pairing in single reducer. >>>>> >>>>> Thanks, >>>>> Navis >>>>> >>>>> >>>>> 2014-07-10 1:50 GMT+09:00 Malligarjunan S : >>>>> >>>>>> Hello All, >>>>>> Is that the expected behavior from hive to take so much of time? >>>>>> >>>>>> >>>>>> Thanks and Regards, >>>>>> Sankar S >>>>>> >>>>>> >>>>>> On Tue, Jul 8, 2014 at 11:23 PM, Malligarjunan S < >>>>>> malligarjunan@gmail.com> wrote: >>>>>> >>>>>>> Hello All, >>>>>>> >>>>>>> Can any one help me to answer to my question posted on Stackoverflo= w? >>>>>>> >>>>>>> http://stackoverflow.com/questions/24416373/hive-udf-performance-to= o-slow >>>>>>> It is pretty urgent. Please help me. >>>>>>> >>>>>>> Thanks and Regards, >>>>>>> Sankar S. >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> -- >>>> Sorry this was sent from mobile. Will do less grammar and spell check >>>> than usual. >>>> >>> >>> >> > --089e013d1d9eff058f04fde1b0d1 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
The "small" table can be any size. You want the = small table to be /path/to/table/b here because that will result in more pa= rallelism. There is a ticket on hive theta join that you might want to look= at.


On Thu,= Jul 10, 2014 at 10:23 PM, Malligarjunan S <malligarjunan@gmail.com<= /a>> wrote:
Hello Edwards,

Thank you very much for the update.
What size you mean is small table. In our case the small table will have mi= nimum of 1 million records.
Can we use this UDTF? how much time improvement will be there?
Appreciate your help!
Thanks and Regards
SankarS
<= div>


On Thu, J= ul 10, 2014 at 12:56 PM, Malligarjunan S <malligarjunan@gmail.com> wrote:
Hello Edward= ,

Thank you very much for helping me.
I am new to hive.=C2= =A0 Could you please provide the sample map reduce job?

Regards,
Sankar S




On Thu, Jul 10, 2014 at 8:19 AM, Edward Capriolo <edlinuxguru= @gmail.com> wrote:
Hive cross product stinks . I have a map red= uce job that will do it


On Wednesday, July 9, 2014, Navis= =EB=A5=98=EC=8A=B9=EC=9A=B0 <navis.ryu@nexr.com> wrote:
Yes, 2M x 1M makes 2T pairing in single reducer.

<= /div>
Thanks,
Navis

2014-07-10 1:50 GMT+09:00 Malligarjunan S <= span dir=3D"ltr"><malligarjunan@gmail.com>:
Hello All,
Is = that the expected behavior from hive to take so much of time?


Thanks and Regards,
Sankar S


On Tue, Jul 8, 2014 at 11:23 PM, Malligarjunan S= <malligarjunan@gmail.com> wrote:
Hello All,

Can any one help me= to answer to my question posted on Stackoverflow?
http://stackoverflow.com/questions/24416373/hive-udf-performance-to= o-slow
It is pretty urgent. Please help me.

Tha= nks and Regards,
Sankar S.




--
Sorry this was sent from mobile. Will do less grammar and spell check than = usual.




--089e013d1d9eff058f04fde1b0d1--