hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vijay <>
Subject Re: Questions on date arithmetic/calculations
Date Thu, 15 Oct 2009 21:24:42 GMT
Thanks Ashish! This is the same approach I'm using as well and it seems to
be working very good

One thing I wasn't sure at first but was later surprised was how Hive is
able to figure out which partitions to work on using WHERE clauses. What I
mean by that is if I do something like WHERE month(ds)=9, it is able to
figure out that it needs to just use the partitions 2009-09-01 to
2009-09-30. How does the query engine know this? Does it evaluate partition
column related expressions locally?


On Tue, Oct 13, 2009 at 2:25 PM, Ashish Thusoo <> wrote:

>  We store the the partitioning as
> in that format the string representation of the date has the same
> lexicographical ordering as the date itself. So if you have that as the
> format of the string in the ds column (hive does not have date functions
> yet), then the expressions of the kind
> ds >= '2009-08-15' and ds <= '2009-09-15'
> will pick up the right partitions.
> For doing counts over the month you can either extract the month from the
> date string using the substring(ds, 5, 2) udf in hive or you can use
> month(ds) and then put
> that in the group by clause of the query.
> Ashish
>  ------------------------------
> *From:* Vijay []
> *Sent:* Monday, October 12, 2009 5:05 PM
> *To:*
> *Subject:* Questions on date arithmetic/calculations
> Hi,
> I have some basic questions on how hive handles dates and date arithmetic.
> I apologize if this has already been addressed. Per most samples on this
> site and elsewhere, I can have an access log table defined with a partition
> scheme that looks like this: ds='09-08-09'. This is obviously pretty good to
> partition the data. However, how can this information be used later in
> queries? For example, if I want to select data for all dates between
> 08/15/09 and 09/15/09, how would I do that? The partition column ds cannot
> be used with >= and similar operators right? Additionally, when is
> partitioned this way, how can I do counts on month, etc? Obviously all of
> these queries need to be expressed in a way hive can still take advantage of
> the partitioning scheme. I hope that makes sense.
> Thanks,
> Vijay

View raw message