hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jamie Olson <jamie.f.ol...@gmail.com>
Subject Re: Hive ignoring buckets when using dynamic where
Date Thu, 20 Sep 2012 14:52:51 GMT
This might be somewhat off-topic, but is the query optimizer smart enough
to conditions like 'date2 = date_sub(date1,1)' execute efficiently?  I
would certainly hope so when date1 is a constant, but what if this is a
join condition?  If date is a partition or clustered column, will hive do
the right thing?

Jamie Olson


On Thu, Sep 20, 2012 at 10:27 AM, Bejoy KS <bejoy_ks@yahoo.com> wrote:

> Hi Robin
>
> AFAIK, one of the possible solutions is is to take the computation of
> current date out of hive, may be to a shell script (you can have the
> computation current - x days aslo from within a shell script i guess ).
> Once you have identified the required date, store it in a variable and
> substitute the same in your hive query.
>
> Some links for your reference
> http://hive.apache.org/docs/r0.9.0/language_manual/var_substitution.html
>
> http://kickstarthadoop.blogspot.in/2011/10/include-values-during-execution-time-in.html
>
>
> Regards,
> Bejoy KS
>
>   ------------------------------
> *From:* Robin Verlangen <robin@us2.nl>
> *To:* user@hive.apache.org; Bejoy KS <bejoy_ks@yahoo.com>
> *Sent:* Thursday, September 20, 2012 7:30 PM
> *Subject:* Re: Hive ignoring buckets when using dynamic where
>
> Hi Bejoy,
>
> Thank you for your reply. Is there any way to fix my problem? I want to
> have a query that has a dynamic range, from now (and in some cases now - x
> days until now).
>
> Best regards,
>
> Robin Verlangen
> *Software engineer*
> *
> *
> W http://www.robinverlangen.nl
> E robin@us2.nl
>
> Disclaimer: The information contained in this message and attachments is
> intended solely for the attention and use of the named addressee and may be
> confidential. If you are not the intended recipient, you are reminded that
> the information remains the property of the sender. You must not use,
> disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies.
>
>
>
> 2012/9/20 Bejoy KS <bejoy_ks@yahoo.com>
>
> Hi Robin
>
> The result of 'bdate=to_date(unix_timestamp())' is evaluated during the
> runtime of the query. But the data that a query should process is
> determined initially before executing the map reduce jobs. That is the
> reason the query is running over whole data set.
>
> When you provide 'bdate='2012-09-01'' the hive parser knows initially
> itself what data which all partitions should be taken into account. So this
> query runs on only the required partitions and not on whole data.
>
> To add on , it is not the buckets considered here on where clause but the
> partitions.
>
> Regards,
> Bejoy KS
>
>   ------------------------------
> *From:* Robin Verlangen <robin@us2.nl>
> *To:* user@hive.apache.org
> *Sent:* Thursday, September 20, 2012 5:06 PM
> *Subject:* Hive ignoring buckets when using dynamic where
>
> Hi there,
>
> We're working on some queries that use buckets to improve performance with
> like 1000x. However we ran into a problem. When we use a fixed hardcoded
> date it works fine:
>
>  SELECT * FROM standard_feed WHERE bdate='2012-09-01'
> *Starts a job with 6 mappers, 2 reducers*
>
> When we use it dynamically:
> SELECT * FROM standard_feed WHERE bdate=to_date(unix_timestamp())
> *Starts a job with 1000 mappers, 2 reducers*
> *
> *
> What's the problem here? The result of the to_date of the current
> timestamp should be equal to a normal fixed date? Does anyone have a
> solution?
>
> Best regards,
>
> Robin Verlangen
> *Software engineer*
> *
> *
> W http://www.robinverlangen.nl
> E robin@us2.nl
>
> Disclaimer: The information contained in this message and attachments is
> intended solely for the attention and use of the named addressee and may be
> confidential. If you are not the intended recipient, you are reminded that
> the information remains the property of the sender. You must not use,
> disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies.
>
>
>
>
>
>
>

Mime
View raw message