Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 96CADD41B for ; Thu, 20 Sep 2012 14:53:18 +0000 (UTC) Received: (qmail 37263 invoked by uid 500); 20 Sep 2012 14:53:17 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 37220 invoked by uid 500); 20 Sep 2012 14:53:17 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 37210 invoked by uid 99); 20 Sep 2012 14:53:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Sep 2012 14:53:17 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jamie.f.olson@gmail.com designates 209.85.223.176 as permitted sender) Received: from [209.85.223.176] (HELO mail-ie0-f176.google.com) (209.85.223.176) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Sep 2012 14:53:12 +0000 Received: by ieak12 with SMTP id k12so1914111iea.35 for ; Thu, 20 Sep 2012 07:52:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=6ilUxlzO9T+qeZFh0wvzuCxrZ2QEs6Qs9amwtnv0Vko=; b=YfGjYmxAoi5xbkmAAmE0sComaDK4ngmhfH4J7sJGxj5bfaGLGrZgBaqoGBs7XLa8xm 5N96al93IucmVsMerZnoQt+Tc8NEvNdrz8Gpwk+8SDeKG3Ab3xfWJujtcaJ3QA17mEer g9bv+5hvw0uLpsqrCbU9YKsjShoA7cE9x2Gj1NvfSDQXwsMI8qCcylX4ZRhFbNsnz6FY 4Y7RBN9g4SYRS33KKp0NxeRQs7Q+rH0cNGG8cpPudpdycgAAvylQPBTKAUGJY0/2cT0P iBUfsIfjnPTWXHxnAwY0VMy9gnj5ym97wOlXbY0sZz5MSMC3JlC5/VVR4Clk1maxzZQR wG4g== MIME-Version: 1.0 Received: by 10.42.62.143 with SMTP id y15mr1597508ich.38.1348152771937; Thu, 20 Sep 2012 07:52:51 -0700 (PDT) Received: by 10.64.168.73 with HTTP; Thu, 20 Sep 2012 07:52:51 -0700 (PDT) In-Reply-To: <1348151270.11841.YahooMailNeo@web121202.mail.ne1.yahoo.com> References: <1348148635.37561.YahooMailNeo@web121202.mail.ne1.yahoo.com> <1348151270.11841.YahooMailNeo@web121202.mail.ne1.yahoo.com> Date: Thu, 20 Sep 2012 10:52:51 -0400 Message-ID: Subject: Re: Hive ignoring buckets when using dynamic where From: Jamie Olson To: user@hive.apache.org, Bejoy KS Content-Type: multipart/alternative; boundary=20cf30223c9113066d04ca234730 X-Virus-Checked: Checked by ClamAV on apache.org --20cf30223c9113066d04ca234730 Content-Type: text/plain; charset=ISO-8859-1 This might be somewhat off-topic, but is the query optimizer smart enough to conditions like 'date2 = date_sub(date1,1)' execute efficiently? I would certainly hope so when date1 is a constant, but what if this is a join condition? If date is a partition or clustered column, will hive do the right thing? Jamie Olson On Thu, Sep 20, 2012 at 10:27 AM, Bejoy KS wrote: > Hi Robin > > AFAIK, one of the possible solutions is is to take the computation of > current date out of hive, may be to a shell script (you can have the > computation current - x days aslo from within a shell script i guess ). > Once you have identified the required date, store it in a variable and > substitute the same in your hive query. > > Some links for your reference > http://hive.apache.org/docs/r0.9.0/language_manual/var_substitution.html > > http://kickstarthadoop.blogspot.in/2011/10/include-values-during-execution-time-in.html > > > Regards, > Bejoy KS > > ------------------------------ > *From:* Robin Verlangen > *To:* user@hive.apache.org; Bejoy KS > *Sent:* Thursday, September 20, 2012 7:30 PM > *Subject:* Re: Hive ignoring buckets when using dynamic where > > Hi Bejoy, > > Thank you for your reply. Is there any way to fix my problem? I want to > have a query that has a dynamic range, from now (and in some cases now - x > days until now). > > Best regards, > > Robin Verlangen > *Software engineer* > * > * > W http://www.robinverlangen.nl > E robin@us2.nl > > Disclaimer: The information contained in this message and attachments is > intended solely for the attention and use of the named addressee and may be > confidential. If you are not the intended recipient, you are reminded that > the information remains the property of the sender. You must not use, > disclose, distribute, copy, print or rely on this e-mail. If you have > received this message in error, please contact the sender immediately and > irrevocably delete this message and any copies. > > > > 2012/9/20 Bejoy KS > > Hi Robin > > The result of 'bdate=to_date(unix_timestamp())' is evaluated during the > runtime of the query. But the data that a query should process is > determined initially before executing the map reduce jobs. That is the > reason the query is running over whole data set. > > When you provide 'bdate='2012-09-01'' the hive parser knows initially > itself what data which all partitions should be taken into account. So this > query runs on only the required partitions and not on whole data. > > To add on , it is not the buckets considered here on where clause but the > partitions. > > Regards, > Bejoy KS > > ------------------------------ > *From:* Robin Verlangen > *To:* user@hive.apache.org > *Sent:* Thursday, September 20, 2012 5:06 PM > *Subject:* Hive ignoring buckets when using dynamic where > > Hi there, > > We're working on some queries that use buckets to improve performance with > like 1000x. However we ran into a problem. When we use a fixed hardcoded > date it works fine: > > SELECT * FROM standard_feed WHERE bdate='2012-09-01' > *Starts a job with 6 mappers, 2 reducers* > > When we use it dynamically: > SELECT * FROM standard_feed WHERE bdate=to_date(unix_timestamp()) > *Starts a job with 1000 mappers, 2 reducers* > * > * > What's the problem here? The result of the to_date of the current > timestamp should be equal to a normal fixed date? Does anyone have a > solution? > > Best regards, > > Robin Verlangen > *Software engineer* > * > * > W http://www.robinverlangen.nl > E robin@us2.nl > > Disclaimer: The information contained in this message and attachments is > intended solely for the attention and use of the named addressee and may be > confidential. If you are not the intended recipient, you are reminded that > the information remains the property of the sender. You must not use, > disclose, distribute, copy, print or rely on this e-mail. If you have > received this message in error, please contact the sender immediately and > irrevocably delete this message and any copies. > > > > > > > --20cf30223c9113066d04ca234730 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
This might be somewhat off-topic, but is the query optimizer smart eno= ugh to conditions like 'date2 =3D date_sub(date1,1)' execute effici= ently? =A0I would certainly hope so when date1 is a constant, but what if t= his is a join condition? =A0If date is a partition or clustered column, wil= l hive do the right thing?

Jamie Olson


On Thu, Sep 20, 2012 at 10:27 AM, Bejoy = KS <bejoy_ks@yahoo.com> wrote:
Hi Robin

AFAIK, one of the possible solutions is is to take the computation of curre= nt date out of hive, may be to a shell script (you can have the computation= current - x days aslo from within a shell script i guess ). Once you have = identified the required date, store it in a variable and substitute the sam= e in your hive query. =A0

Some links for your reference

=A0
<= /div>
Regards,
= Bejoy KS


From: Robi= n Verlangen <robin@us2= .nl>
To: user@hive.apache.or= g; Bejoy KS <bejoy_ks@yahoo.com>
Sent: Thursday, September 2= 0, 2012 7:30 PM
Subject:= Re: Hive ignoring buckets when using dynamic where

Hi Bejoy,

Thank you for your reply. Is there any wa= y to fix my problem? I want to have a query that has a dynamic range, from = now (and in some cases now - x days until now).

Best regards,=A0

Robin Verlangen
Software engineer


Disclaimer: The information con= tained in this message and attachments is intended solely for the attention= and use of the named addressee and may be confidential. If you are not the= intended recipient, you are reminded that the information remains the prop= erty of the sender. You must not use, disclose, distribute, copy, print or = rely on this e-mail. If you have received this message in error, please con= tact the sender immediately and irrevocably delete this message and any cop= ies.



2012/9/20 Bejoy KS <bejoy_ks@yahoo.com&= gt;
Hi Robin

The result of 'bdate=3Dto_date(unix_timestamp())' is=A0evaluated=A0during the runtime of the que= ry. But the data that a query should process is determined=A0initially=A0be= fore executing the map reduce jobs. That is the reason the query is running over whole data set.<= /font>

When you provide '= ;bdate=3D'2012-09-01'<= span style=3D"font-size:13px;background-color:transparent">' the hive p= arser knows initially itself what data which all partitions should be taken= into account. So this query runs on only the required partitions and not o= n whole data.

To add on , it is not the=A0buckets=A0considered here on where clause but = the partitions. =A0
=A0
Regards,
Bejoy KS


Hi there,

We're working on some queries that us= e buckets to improve performance with like 1000x. However we ran into a pro= blem. When we use a fixed hardcoded date it works fine:

SELECT * FROM standard_feed WHERE bdate=3D'2012-09-01'
Starts a job with 6 mappers, 2 reducers

When = we use it dynamically:
SELECT * FROM standard_feed WHERE bdate=3D= to_date(unix_timestamp())
Starts a job with 1000 mappers, 2 reducers

What's the problem here? The result of the to_date of the c= urrent timestamp should be equal to a normal fixed date? Does anyone have a= solution?

Best regards,=A0

Robin Verlan= gen
Software engineer


Disclaimer= : The information contained in this message and attachments is intended sol= ely for the attention and use of the named addressee and may be confidentia= l. If you are not the intended recipient, you are reminded that the informa= tion remains the property of the sender. You must not use, disclose, distri= bute, copy, print or rely on this e-mail. If you have received this message= in error, please contact the sender immediately and irrevocably delete thi= s message and any copies.





--20cf30223c9113066d04ca234730--