Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 536DA18668 for ; Fri, 15 May 2015 00:18:54 +0000 (UTC) Received: (qmail 62017 invoked by uid 500); 15 May 2015 00:18:52 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 61946 invoked by uid 500); 15 May 2015 00:18:52 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 61936 invoked by uid 99); 15 May 2015 00:18:52 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 May 2015 00:18:52 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 65B20183980 for ; Fri, 15 May 2015 00:18:52 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.879 X-Spam-Level: ** X-Spam-Status: No, score=2.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id uO6dK4Bijwew for ; Fri, 15 May 2015 00:18:51 +0000 (UTC) Received: from mail-wi0-f170.google.com (mail-wi0-f170.google.com [209.85.212.170]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 711D32030F for ; Fri, 15 May 2015 00:18:50 +0000 (UTC) Received: by wicnf17 with SMTP id nf17so115326748wic.1 for ; Thu, 14 May 2015 17:18:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=xWzY/x6sjRM5Nan+PApyll2ylYiLAX5j9UcSWQr5pug=; b=NAo+xZRXx0yrEiLyPctKVdduU4HshUHVGjUd6oodnZ3osPQIFvdeX9tFVHligZCIhE 7Yt59XcGqCiYGIH0JM3bLJt8FHl4iQl+alszF15MXAZIOoK6Z5QcfE1SU1gEb4PMxkGa k+bgYn+Q8Php9YuT/rub25XatUBCMFOeWTHKu46inB/bbtk9uzKA7PuOfGt4L8bbkdjz eHzK6CfLFyzRwj3RakvHuv2DbN7112FFuszBc+isuRltAAlX7AD9T/IaZaJESv5ZPRGY Mvj4Ql5EdnvLm80v5XMxw+8QJ6owA4v5o4Ul5MgzfQLd4OuhHbvtqBXjE0TE52UXK011 +6pw== MIME-Version: 1.0 X-Received: by 10.194.161.138 with SMTP id xs10mr13201596wjb.37.1431649084956; Thu, 14 May 2015 17:18:04 -0700 (PDT) Received: by 10.28.152.131 with HTTP; Thu, 14 May 2015 17:18:04 -0700 (PDT) In-Reply-To: References: Date: Thu, 14 May 2015 17:18:04 -0700 Message-ID: Subject: Re: Partition Columns From: Appan Thirumaligai To: user@hive.apache.org Content-Type: multipart/alternative; boundary=089e013d0f3826e670051613c752 --089e013d0f3826e670051613c752 Content-Type: text/plain; charset=UTF-8 Mungeol, I did check the # of mappers and that did not change between the two queries but when I ran a count(*) query the total execution time reduced significantly for Query1 vs Query2. Also, the amount data the query reads does change when the where clause changes. I still can't explain why one is faster over the other. Thanks, Appan On Thu, May 14, 2015 at 4:46 PM, Mungeol Heo wrote: > Hi, Appan. > > you can just simply check the amount of data your query reads from the > table. or the number of the mapper for running that query. > then, you can know whether it filtering or scanning all table. > Of course, it is a lazy approach. but, you can give a try. > I think query 1 should work fine. because I am using a lot of that > kind of queries and it works fine for me. > > Thanks, > mungeol > > On Fri, May 15, 2015 at 8:31 AM, Appan Thirumaligai > wrote: > > I agree with you Viral. I see the same behavior as well. We are on Hive > 0.13 > > for the cluster where I'm testing this. > > > > On Thu, May 14, 2015 at 2:16 PM, Viral Bajaria > > wrote: > >> > >> Hi Appan, > >> > >> In my experience I have seen that Query 2 does not use partition pruning > >> because it's not a straight up filtering and involves using functions > (aka > >> UDFs). > >> > >> What version of Hive are you using ? > >> > >> Thanks, > >> Viral > >> > >> > >> > >> On Thu, May 14, 2015 at 1:48 PM, Appan Thirumaligai > >> wrote: > >>> > >>> Hi, > >>> > >>> I have a question on Hive Optimizer. I have a table with partition > >>> columns eg.,Sales partitioned by year, month, day. Assume that I have > two > >>> years worth of data on this table. I'm running two queries on this > table. > >>> > >>> Query 1: Select * from Sales where year=2015 and month = 5 and day > >>> between 1 and 7 > >>> > >>> Query 2: Select * from Sales where concat_ws('-',cast(year as > >>> string),lpad(cast(month as string),2,'0'),lpad(cast(day as > string),2,'0')) > >>> between '2015-01-01' and '2015-01-07' > >>> > >>> When I ran Explain command on the above two queries I get a Filter > >>> operation for the 2nd Query and there is no Filter Operation for the > first > >>> query. > >>> > >>> My question is: Do both queries use the partitions or is it used only > in > >>> Query 1 and for Query 2 it will be a scan of all the data? > >>> > >>> Thanks for your help. > >>> > >>> Thanks, > >>> Appan > >> > >> > > > --089e013d0f3826e670051613c752 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Mungeol,

I did check the # of mappers a= nd that did not change between the two queries but when I ran a count(*) qu= ery the total execution time reduced significantly for Query1 vs Query2. Al= so, the amount data the query reads does change when the where clause chang= es. I still can't explain why one is faster over the other.
<= br>
Thanks,
Appan

<= div class=3D"gmail_quote">On Thu, May 14, 2015 at 4:46 PM, Mungeol Heo <mungeol.heo@gmail.com> wrote:
Hi, Appan.

you can just simply check the amount of data your query reads from the
table. or the number of the mapper for running that query.
then, you can know whether it filtering or scanning all table.
Of course, it is a lazy approach. but, you can give a try.
I think query 1 should work fine. because I am using a lot of that
kind of queries and it works fine for me.

Thanks,
mungeol

On Fri, May 15, 2015 at 8:31 AM, Appan Thirumaligai
<appanhiveug@gmail.com> = wrote:
> I agree with you Viral. I see the same behavior as well. We are on Hiv= e 0.13
> for the cluster where I'm testing this.
>
> On Thu, May 14, 2015 at 2:16 PM, Viral Bajaria <viral.bajaria@gmail.com>
> wrote:
>>
>> Hi Appan,
>>
>> In my experience I have seen that Query 2 does not use partition p= runing
>> because it's not a straight up filtering and involves using fu= nctions (aka
>> UDFs).
>>
>> What version of Hive are you using ?
>>
>> Thanks,
>> Viral
>>
>>
>>
>> On Thu, May 14, 2015 at 1:48 PM, Appan Thirumaligai
>> <appanhiveug@gmail.com= > wrote:
>>>
>>> Hi,
>>>
>>> I have a question on Hive Optimizer. I have a table with parti= tion
>>> columns=C2=A0 eg.,Sales partitioned by year, month, day. Assum= e that I have two
>>> years worth of data on this table. I'm running two queries= on this table.
>>>
>>> Query 1: Select * from Sales where year=3D2015 and month =3D 5= and day
>>> between 1 and 7
>>>
>>> Query 2: Select * from Sales where concat_ws('-',cast(= year as
>>> string),lpad(cast(month as string),2,'0'),lpad(cast(da= y as string),2,'0'))
>>> between '2015-01-01' and '2015-01-07'
>>>
>>> When I ran Explain command on the above two queries I get a Fi= lter
>>> operation for the 2nd Query and there is no Filter Operation f= or the first
>>> query.
>>>
>>> My question is: Do both queries use the partitions or is it us= ed only in
>>> Query 1 and for Query 2 it will be a scan of all the data?
>>>
>>> Thanks for your help.
>>>
>>> Thanks,
>>> Appan
>>
>>
>

--089e013d0f3826e670051613c752--