hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prasanth Jayachandran <>
Subject Re: ORC queries inefficient for sorted field
Date Mon, 24 Feb 2014 20:06:18 GMT
Hi Bryan

ORC indexes are used only for the selection of stripes and row groups and not for answering

You can enable hive.compute.query.using.stats flag to answer queries using metadata. When
this flag is enabled, hive metastore is checked to see if column statistics exists for the
required columns. If column statistics exists, then certain queries like min, max, count etc.
will be answered without ever scanning the table. 

This is the JIRA that added the above feature (its available in hive version 0.13.0)

Prasanth Jayachandran

On Feb 22, 2014, at 7:03 PM, Bryan Jeffrey <> wrote:

> Hello.
> I'm running Hadoop 2.2.0 and Hive 0.12.0.
> I have an ORC table partitioned by 'range', and sorted by 'time'.  I want to select the
max(time) value from a table for a given set of partitions.  I begin with a query that looks
like the following:
> select max(time) from my_table where range > 1234;
> The information about the minimum/maximum values for a given integer column is in the
ORC metadata.  If I run '--orcfiledump' on the files in the specified partition(s), I can
see the following output:  
> Column 1: count: 233056 min: 1393123416 max: 1393123499 sum: 324675782247877
> However, my queries do not seem to be using the information.  The query I'm running ends
up with several hundred mappers, and takes  a very long time to run on the data.  Running
an orcfiledump on the files themselves (and simply pulling the values for column 1) is faster
by several orders of magnitude.  
> I have verified that 'hive.optimize.index.filter' and 'hive.optimize.ppd' are set to
'true'.  What can I do to avoid processing actual records for this particular query, instead
using the ORC file metadata or metastore metadata?
> Regards,
> Bryan Jeffrey

NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

View raw message