hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Appan Thirumaligai <appanhiv...@gmail.com>
Subject Re: Partition Columns
Date Fri, 15 May 2015 00:18:04 GMT
Mungeol,

I did check the # of mappers and that did not change between the two
queries but when I ran a count(*) query the total execution time reduced
significantly for Query1 vs Query2. Also, the amount data the query reads
does change when the where clause changes. I still can't explain why one is
faster over the other.

Thanks,
Appan

On Thu, May 14, 2015 at 4:46 PM, Mungeol Heo <mungeol.heo@gmail.com> wrote:

> Hi, Appan.
>
> you can just simply check the amount of data your query reads from the
> table. or the number of the mapper for running that query.
> then, you can know whether it filtering or scanning all table.
> Of course, it is a lazy approach. but, you can give a try.
> I think query 1 should work fine. because I am using a lot of that
> kind of queries and it works fine for me.
>
> Thanks,
> mungeol
>
> On Fri, May 15, 2015 at 8:31 AM, Appan Thirumaligai
> <appanhiveug@gmail.com> wrote:
> > I agree with you Viral. I see the same behavior as well. We are on Hive
> 0.13
> > for the cluster where I'm testing this.
> >
> > On Thu, May 14, 2015 at 2:16 PM, Viral Bajaria <viral.bajaria@gmail.com>
> > wrote:
> >>
> >> Hi Appan,
> >>
> >> In my experience I have seen that Query 2 does not use partition pruning
> >> because it's not a straight up filtering and involves using functions
> (aka
> >> UDFs).
> >>
> >> What version of Hive are you using ?
> >>
> >> Thanks,
> >> Viral
> >>
> >>
> >>
> >> On Thu, May 14, 2015 at 1:48 PM, Appan Thirumaligai
> >> <appanhiveug@gmail.com> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I have a question on Hive Optimizer. I have a table with partition
> >>> columns  eg.,Sales partitioned by year, month, day. Assume that I have
> two
> >>> years worth of data on this table. I'm running two queries on this
> table.
> >>>
> >>> Query 1: Select * from Sales where year=2015 and month = 5 and day
> >>> between 1 and 7
> >>>
> >>> Query 2: Select * from Sales where concat_ws('-',cast(year as
> >>> string),lpad(cast(month as string),2,'0'),lpad(cast(day as
> string),2,'0'))
> >>> between '2015-01-01' and '2015-01-07'
> >>>
> >>> When I ran Explain command on the above two queries I get a Filter
> >>> operation for the 2nd Query and there is no Filter Operation for the
> first
> >>> query.
> >>>
> >>> My question is: Do both queries use the partitions or is it used only
> in
> >>> Query 1 and for Query 2 it will be a scan of all the data?
> >>>
> >>> Thanks for your help.
> >>>
> >>> Thanks,
> >>> Appan
> >>
> >>
> >
>

Mime
View raw message