hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: Impact of partitioning on certain queries
Date Fri, 08 Jan 2016 09:54:26 GMT
Try explain dependency

> On 08 Jan 2016, at 10:47, Mich Talebzadeh <mich@peridale.co.uk> wrote:
> 
> Thanks Gopal.
>  
> Basically the following is true:
>  
> 1.    The storage layer is HDFS
> 2.    The execution engine is MR, Tez, Spark etc
> 3.    The access layer is Hive
>  
> When we say the access layer is Hive, is the assumption correct that we are referring
to optimiser (loosly related to the optimiser in RDBMS). For example is Hive optimiser aware
of the number of underlying partitions. The reason I am asking this question is that with
EXPLAIN I only see Table scan and it does refer to any partition or partition elimination?
>  
>  
> Cheers
>  
>  
> NOTE: The information in this email is proprietary and confidential. This message is
for the designated recipient only, if you are not the intended recipient, you should destroy
it immediately. Any information in this message shall not be understood as given or endorsed
by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated.
It is the responsibility of the recipient to ensure that this email is virus free, therefore
neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility.
>  
>  
> -----Original Message-----
> From: Gopal Vijayaraghavan [mailto:gopal@hortonworks.com] On Behalf Of Gopal Vijayaraghavan
> Sent: 08 January 2016 09:34
> To: user@hive.apache.org
> Subject: Re: Impact of partitioning on certain queries
>  
>  
> > Ok we hope that partitioning improves performance where the predicate
> >is on partitioned columns
>  
> Nope.
>  
> Partitioning *only* improves performance if your queries run with
>  
> set hive.mapred.mode=strict;
>  
> That's the "use strict" easy way to make sure you're writing good queries.
>  
> Even then, schema design in hive is something you need to learn with the assumption that
neither the storage layer, nor the compute layer is part of "hive".
>  
> It floats itself in an "access" layer above both. Not sure there's any legacy tech to
draw parallels with that.
>  
> If you haven't seen this before, here's an example of the problem
>  
> http://www.slideshare.net/Hadoop_Summit/hive-at-yahoo-letters-from-the-tren
> ches/24
>  
>  
> Cheers,
> Gopal

Mime
View raw message