Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1478D9B1E for ; Tue, 1 May 2012 13:39:17 +0000 (UTC) Received: (qmail 36022 invoked by uid 500); 1 May 2012 13:39:16 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 35977 invoked by uid 500); 1 May 2012 13:39:15 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 35969 invoked by uid 99); 1 May 2012 13:39:15 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 May 2012 13:39:15 +0000 X-ASF-Spam-Status: No, hits=4.7 required=5.0 tests=FREEMAIL_FORGED_REPLYTO,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [98.139.91.230] (HELO nm26-vm0.bullet.mail.sp2.yahoo.com) (98.139.91.230) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 01 May 2012 13:39:09 +0000 Received: from [98.139.91.67] by nm26.bullet.mail.sp2.yahoo.com with NNFMP; 01 May 2012 13:38:43 -0000 Received: from [98.139.91.57] by tm7.bullet.mail.sp2.yahoo.com with NNFMP; 01 May 2012 13:38:43 -0000 Received: from [127.0.0.1] by omp1057.mail.sp2.yahoo.com with NNFMP; 01 May 2012 13:38:43 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 499709.35430.bm@omp1057.mail.sp2.yahoo.com Received: (qmail 62092 invoked by uid 60001); 1 May 2012 13:38:43 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1335879522; bh=2/OLWZXUVidKk4e6DIkUmBTTv54cssJaMjfMiAnowj4=; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:MIME-Version:Content-Type; b=EtEMzJ4HNWSySknPMQjH3pnIswr7ANMLJgatt85n4crZp2eYXKWc6t3U4Sua5ldPzeyX3Ld3eMzFq4jBbRvBbY14pp+XfgJAU7Y1fmnBhLz7WpI6AFBTVY2AK5PK3EApRLI5RuZqhusRgjMlElhSeoPjGsQx2rkgIESN4YXOnIo= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:MIME-Version:Content-Type; b=Y019G+zLu9LX+AUJNbVbxeDbxVFgmLCxkmpiIPmD7lCwk7RKVRPrQLq+BISlQ9Idc+oTJg8sAGZcOlc8QMo0bUPzfgtzYr1O/8JVBaT2QKGvrEk1CSvS2ETbbRV0KqzSPeXnl83a91+/DZ1ZwrR1wV/BfNAMsO9DCd+uc7Ld4nc=; X-YMail-OSG: 2bGbhlcVM1lHTRxyoMT6Amws8rf9Yo.D2kh6GHMh69JvOLH 1JAU8EyLbWMfnJkJ9CExyg6Co5SZkK9QRBD5jTnfiD3J7nGM.PueVSjB2XHk ApkwaYzIwf0wPI0af9.pz8.gOetvdMi1fGw0ZLgmb4I6OM1TA9irS7Yats9b RW71OZRzGz3MDw1vlVQlbAo0YMsstSh9dXEMKR.unsc07Hvk4jImdKZcnGIa 3EkNVpGyIMBlfBeMoGspI2bmAWd7hqGGbzDqjGux_9Xk9j2HxJ1y32M9LhNh ZU0mK0D1fkLx2OYbdiKIVOfvOGoxJkpRvXt_85g2a.b0cYKpZYPrXX.mJIgT pd_yrDQ3XtoMzmNvzdvh_s.tXLJ6bVOLRbcvXAYrLjn64a4Fg1qUtzHTZSAK 1.wnO996HzQESJc7dOqEqedHNhEW3nUu68yY.FONFB2__0wKmZZDm1_en3wp y6ApVS036hN2IKosv5TnQbBiRXhSiIXmsuSrlA2Dw61hT6_0SdOHK8OU8PH7 XPEgsa32sV3WvIBBf1eLRlxn7Ke3FykkoBDkx_xNuMgjK1bcmfDUYiknuRqN FzSw.E6SnxVzoCUndSmy4xQlpeesIAdNXNzC0iu_n3RH2hniy02MJp2syO4R AZMw82fcIZzD4.PwnrLaIrY4yP7.BDg-- Received: from [91.232.36.10] by web31813.mail.mud.yahoo.com via HTTP; Tue, 01 May 2012 06:38:42 PDT X-Mailer: YahooMailWebService/0.8.118.349524 References: <1334829423.17716.YahooMailNeo@web31807.mail.mud.yahoo.com> Message-ID: <1335879522.55573.YahooMailNeo@web31813.mail.mud.yahoo.com> Date: Tue, 1 May 2012 06:38:42 -0700 (PDT) From: Ramkumar Reply-To: Ramkumar Subject: Re: nested UDFs on Partition column To: "user@hive.apache.org" , Philip Tromans MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="767760015-706654262-1335879522=:55573" X-Virus-Checked: Checked by ClamAV on apache.org --767760015-706654262-1335879522=:55573 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Thanks. Yes, thats what the UDF writers seemed to have intended. The way th= is has been done is by using the 'deterministic' attribute in=A0 UDFType an= notation.=0ASee =0Ahttp://javasourcecode.org/html/open-source/hive/hive-0.7= .1/org/apache/hadoop/hive/ql/udf/UDFUnixTimeStamp.java.html=0A=0A=0A=0A____= ____________________________=0A From: Philip Tromans =0ATo: user@hive.apache.org =0ASent: Thursday, April 19, 2012 4:38 PM= =0ASubject: Re: nested UDFs on Partition column=0A =0AIn fact, it's just no= t a reasonable thing to do partition pruning on.=0AImagine a situation wher= e you had:=0A=0AWHERE partition_column =3D f(unix_timestamp()) AND ordinary= _column =3D=0Af(unix_timestamp).=0A=0AThe right hand side of the predicate = has to be evaluated at map-time,=0Awhereas you're assuming that left hand s= ide should be evaluated at=0Acompile time, which means you have two differe= nt values of=0Aunix_timestamp() floating around, which can only end badly.= =0A=0ACheers,=0A=0APhil.=0A=0AOn 19 April 2012 16:35, Philip Tromans wrote:=0A> I don't know what the state of Hive's par= tition pruning is, but I=0A> would imagine that the problem is that the two= example you're giving=0A> are fundamentally different.=0A>=0A> 1) WHERE lo= cal_date =3D =3Ddate_add('2011-12-07',3) ,=0A>=0A> the udf is a=0A function= of some constants, so the constant gets=0A> evaluated at compile time.=0A>= =0A> 2) WHERE local_date =3D date_sub(to_date(from_unixtime(unix_timestamp(= ))),3),=0A>=0A> unix_timestamp() is not explicitly a constant, and so the e= xpression=0A> won't be simplified.=0A>=0A> I would imagine that the constan= t simplification code probably doesn't=0A> know the difference between a pa= rtition column and a real column, and=0A> so treats everything as a real co= lumn. If local_date wasn't a=0A> partition column, then there's no reasonab= le way of simplifying that=0A> predicate at compile time.=0A>=0A> Cheers,= =0A>=0A> Phil.=0A>=0A> On 19 April 2012 11:50, Nitin Pawar wrote:=0A>> as per my understanding,=0A>>=0A>> In this case hive = needs to look for all the=0A partitions because it does not=0A>> have the v= alue before hand on the partition check and note the udfs are=0A>> executed= on the mapred and not on hive client side.=0A>>=0A>> I would suggest you w= rite a hive query in a file and replace the partition=0A>> value with a var= iable=0A>> something like=0A>>=0A>> for partitionValue in values=0A>>=0A>> = =A0 =A0 =A0 =A0 =A0 hive=A0$HIVEPARAMS -hiveconf =A0partition=3D$partition = -e hivequery.hql=0A>>=0A>> and then in hivequery.sql you can refer the vari= able with=0A>>=0A>> where column_name =3D '${hiveconf:partition}'=0A>>=0A>>= I may be wrong in interpreting the execution pattern of hivequery but this= =0A>> approach solved my problem=0A>> Thanks,=0A>> nitin=0A>>=0A>>=0A>> On = Thu, Apr 19, 2012 at 3:27 PM, Ramkumar =0A>> = wrote:=0A>>>=0A>>> Hi,=0A>>>=0A>>> I have a table partitioned by local_date= .=A0 When I write a query with=0A>>>=0A>>> WHERE local_date =3D =3Ddate_add= ('2011-12-07',3) ,=0A>>>=0A>>> hive executes the UDF ahead and looks only i= nto the specific partitions.=0A>>> But when the udf becomes more complex li= ke=0A>>>=0A>>> WHERE local_date =3D date_sub(to_date(from_unixtime(unix_tim= estamp())),3),=0A>>>=0A>>> hive looks through all the partitions even thoug= h the above function=A0 can=0A>>> very well be computed ahead of time and o= ptimize the query.=A0 Is this=0A>>> behaviour intentional ? And is there a = workaround other than hardcoding the=0A>>> date or using a=0A param?=0A>>>= =0A>>> Thanks,=0A>>> Ramkumar=0A>>=0A>>=0A>>=0A>>=0A>> --=0A>> Nitin Pawar= =0A>> --767760015-706654262-1335879522=:55573 Content-Type: text/html; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable
Thanks. Yes, thats what the UDF writers seemed to have intended. The= way this has been done is by using the 'deterministic' attribute in  = UDFType annotation.
See
http://javasourcecode.org/html/open-source/h= ive/hive-0.7.1/org/apache/hadoop/hive/ql/udf/UDFUnixTimeStamp.java.html
=
=
From: Phi= lip Tromans <philip.j.tromans@gmail.com>
To: user@hive.apache.org
Sent: Thursday, April 19, 2012 4:38 PM
Subject: Re: nested UDFs on Part= ition column

=0AIn fact, it's just not a reasonable = thing to do partition pruning on.
Imagine a situation where you had:
=
WHERE partition_column =3D f(unix_timestamp()) AND ordinary_column =3D<= br>f(unix_timestamp).

The right hand side of the predicate has to be= evaluated at map-time,
whereas you're assuming that left hand side shou= ld be evaluated at
compile time, which means you have two different valu= es of
unix_timestamp() floating around, which can only end badly.
Cheers,

Phil.

On 19 April 2012 16:35, Philip Tromans <philip.j.tromans@gmail.com= > wrote:
> I don't know what the state of Hive's partition pru= ning is, but I
> would imagine that the problem is that the two examp= le you're giving
> are fundamentally different.
>
> 1) WH= ERE local_date =3D =3Ddate_add('2011-12-07',3) ,
>
> the udf is a=0A function of some constants, so the const= ant gets
> evaluated at compile time.
>
> 2) WHERE local_= date =3D date_sub(to_date(from_unixtime(unix_timestamp())),3),
>
&= gt; unix_timestamp() is not explicitly a constant, and so the expression> won't be simplified.
>
> I would imagine that the constan= t simplification code probably doesn't
> know the difference between = a partition column and a real column, and
> so treats everything as a= real column. If local_date wasn't a
> partition column, then there's= no reasonable way of simplifying that
> predicate at compile time.>
> Cheers,
>
> Phil.
>
> On 19 April 20= 12 11:50, Nitin Pawar <nit= inpawar432@gmail.com> wrote:
>> as per my understanding,>>
>> In this case hive needs to look for all the=0A partitions because it does not
>&g= t; have the value before hand on the partition check and note the udfs are<= br>>> executed on the mapred and not on hive client side.
>>=
>> I would suggest you write a hive query in a file and replace t= he partition
>> value with a variable
>> something like>>
>> for partitionValue in values
>>
>>=           hive $HIVEPARAMS -hiveconf  p= artition=3D$partition -e hivequery.hql
>>
>> and then in = hivequery.sql you can refer the variable with
>>
>> where= column_name =3D '${hiveconf:partition}'
>>
>> I may be w= rong in interpreting the execution pattern of hivequery but this
>>= ; approach solved my problem
>> Thanks,
>> nitin
>&= gt;
>>
>> On Thu, Apr 19, 2012 at 3:27 PM, Ramkumar <<= a rel=3D"nofollow" ymailto=3D"mailto:ramkumar.sugavanam@yahoo.com" target=3D"_blank" href=3D"= mailto:ramkumar.sugavanam@yahoo.com">ramkumar.sugavanam@yahoo.com>>> wrote:
>>>
>>> Hi,
>>>
&g= t;>> I have a table partitioned by local_date.  When I write a q= uery with
>>>
>>> WHERE local_date =3D =3Ddate_add(= '2011-12-07',3) ,
>>>
>>> hive executes the UDF ahe= ad and looks only into the specific partitions.
>>> But when th= e udf becomes more complex like
>>>
>>> WHERE local= _date =3D date_sub(to_date(from_unixtime(unix_timestamp())),3),
>>= >
>>> hive looks through all the partitions even though the = above function  can
>>> very well be computed ahead of tim= e and optimize the query.  Is this
>>> behaviour intention= al ? And is there a workaround other than hardcoding the
>>> da= te or using a=0A param?
>>>
>>> Thanks,
>>> = Ramkumar
>>
>>
>>
>>
>> --
= >> Nitin Pawar
>>


--767760015-706654262-1335879522=:55573--