Return-Path: Delivered-To: apmail-hadoop-hive-user-archive@minotaur.apache.org Received: (qmail 69820 invoked from network); 15 Oct 2009 21:25:12 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 15 Oct 2009 21:25:12 -0000 Received: (qmail 87930 invoked by uid 500); 15 Oct 2009 21:25:12 -0000 Delivered-To: apmail-hadoop-hive-user-archive@hadoop.apache.org Received: (qmail 87891 invoked by uid 500); 15 Oct 2009 21:25:12 -0000 Mailing-List: contact hive-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hive-user@hadoop.apache.org Delivered-To: mailing list hive-user@hadoop.apache.org Received: (qmail 87882 invoked by uid 99); 15 Oct 2009 21:25:12 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Oct 2009 21:25:12 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of techvd@gmail.com designates 209.85.221.201 as permitted sender) Received: from [209.85.221.201] (HELO mail-qy0-f201.google.com) (209.85.221.201) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Oct 2009 21:25:03 +0000 Received: by qyk39 with SMTP id 39so1078949qyk.31 for ; Thu, 15 Oct 2009 14:24:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=7AAL+uFPXtFPuj2iJrs2Pl+ifcZ2jmZpuWxipBE0Jck=; b=KsK8z1J7BJVkkHvvrqgrM2X52tjkhe9Hby6MENm2jIBrEFlU5d31ESlK/7fEaz2vst UAHfVgPaANDNZd0i5j+S3S/fWoaM6HuMItEP2C0KgOUrPi4nuXhEKLvYM4wpYRqbh0Sp xF80CQ4OXnTAhhyLnau83wstdQXg1nHvwHtN4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=pBpFVSx3mRTUWR+xWzf4EB5cIurE2tR2HU4MlK+tKDrVsFvt++7ov2iY7qJd5d39oz KxhQFO5PmvYs9qLYwO2Sy8nw/pBQYbJsn3SEJwxZ06F9Eb/itOEJsgiLraDzZ16JTCMm mDPyUdogcW9KrmcWsGfincTMqipdXACD1wDuE= MIME-Version: 1.0 Received: by 10.229.111.77 with SMTP id r13mr92401qcp.85.1255641882423; Thu, 15 Oct 2009 14:24:42 -0700 (PDT) In-Reply-To: <68B7689C98024D43B4C2709456F0B5200A21547152@SC-MBXC1.TheFacebook.com> References: <5617ccb50910121704r25f19f4er68bb986f32254520@mail.gmail.com> <68B7689C98024D43B4C2709456F0B5200A21547152@SC-MBXC1.TheFacebook.com> Date: Thu, 15 Oct 2009 14:24:42 -0700 Message-ID: <5617ccb50910151424u7ee29e1fp766302862d223bcf@mail.gmail.com> Subject: Re: Questions on date arithmetic/calculations From: Vijay To: hive-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=0023543336425df3e10475ffe87a X-Virus-Checked: Checked by ClamAV on apache.org --0023543336425df3e10475ffe87a Content-Type: text/plain; charset=ISO-8859-1 Thanks Ashish! This is the same approach I'm using as well and it seems to be working very good One thing I wasn't sure at first but was later surprised was how Hive is able to figure out which partitions to work on using WHERE clauses. What I mean by that is if I do something like WHERE month(ds)=9, it is able to figure out that it needs to just use the partitions 2009-09-01 to 2009-09-30. How does the query engine know this? Does it evaluate partition column related expressions locally? Thanks, Vijay On Tue, Oct 13, 2009 at 2:25 PM, Ashish Thusoo wrote: > We store the the partitioning as > > YYYY-MM-DD > > in that format the string representation of the date has the same > lexicographical ordering as the date itself. So if you have that as the > format of the string in the ds column (hive does not have date functions > yet), then the expressions of the kind > > ds >= '2009-08-15' and ds <= '2009-09-15' > > will pick up the right partitions. > > For doing counts over the month you can either extract the month from the > date string using the substring(ds, 5, 2) udf in hive or you can use > month(ds) and then put > that in the group by clause of the query. > > Ashish > > ------------------------------ > *From:* Vijay [mailto:techvd@gmail.com] > *Sent:* Monday, October 12, 2009 5:05 PM > *To:* hive-user@hadoop.apache.org > *Subject:* Questions on date arithmetic/calculations > > Hi, > > I have some basic questions on how hive handles dates and date arithmetic. > I apologize if this has already been addressed. Per most samples on this > site and elsewhere, I can have an access log table defined with a partition > scheme that looks like this: ds='09-08-09'. This is obviously pretty good to > partition the data. However, how can this information be used later in > queries? For example, if I want to select data for all dates between > 08/15/09 and 09/15/09, how would I do that? The partition column ds cannot > be used with >= and similar operators right? Additionally, when is > partitioned this way, how can I do counts on month, etc? Obviously all of > these queries need to be expressed in a way hive can still take advantage of > the partitioning scheme. I hope that makes sense. > > Thanks, > Vijay > --0023543336425df3e10475ffe87a Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thanks Ashish! This is the same approach I'm using as well and it seems= to be working very good

One thing I wasn't sure at first but wa= s later surprised was how Hive is able to figure out which partitions to wo= rk on using WHERE clauses. What I mean by that is if I do something like WH= ERE month(ds)=3D9, it is able to figure out that it needs to just use the p= artitions 2009-09-01 to 2009-09-30. How does the query engine know this? Do= es it evaluate partition column related expressions locally?

Thanks,
Vijay

On Tue, Oct 13, 2009= at 2:25 PM, Ashish Thusoo <athusoo@facebook.com> wrote:
We store the the partitioning as
=A0
YYYY-MM-DD
=A0
in that format the string representation of the date has=20 the same lexicographical ordering as the date itself. So if you have that a= s the=20 format of the string in the ds column (hive does not have date functions ye= t),=20 then the expressions of the kind
=A0
ds >=3D '2009-08-15' and ds <=3D=20 '2009-09-15'
=A0
will pick up the right partitions.
=A0
For doing counts over the month you can either extract the=20 month from the date string using the substring(ds, 5, 2)=A0udf in hive or y= ou=20 can use month(ds) and then put
that in the group by clause of the=20 query.
=A0
Ashish


From: Vijay [mailto:techvd@gmail.com]=20
Sent: Monday, October 12, 2009 5:05 PM
To:=20 hive-user@= hadoop.apache.org
Subject: Questions on date=20 arithmetic/calculations

Hi,

I have some basic questions on how hive handles dates= and=20 date arithmetic. I apologize if this has already been addressed. Per most= =20 samples on this site and elsewhere, I can have an access log table defined = with=20 a partition scheme that looks like this: ds=3D'09-08-09'. This is o= bviously pretty=20 good to partition the data. However, how can this information be used later= in=20 queries? For example, if I want to select data for all dates between 08/15/= 09=20 and 09/15/09, how would I do that? The partition column ds cannot be used w= ith=20 >=3D and similar operators right? Additionally, when is partitioned this= way,=20 how can I do counts on month, etc? Obviously all of these queries need to b= e=20 expressed in a way hive can still take advantage of the partitioning scheme= . I=20 hope that makes sense.

Thanks,
Vijay

--0023543336425df3e10475ffe87a--