hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ådne Brunborg (JIRA) <j...@apache.org>
Subject [jira] [Created] (HIVE-9988) Evaluating UDF before query is run
Date Tue, 17 Mar 2015 08:15:38 GMT
Ådne Brunborg created HIVE-9988:
-----------------------------------

             Summary: Evaluating UDF before query is run
                 Key: HIVE-9988
                 URL: https://issues.apache.org/jira/browse/HIVE-9988
             Project: Hive
          Issue Type: Improvement
            Reporter: Ådne Brunborg


When using UDFs on partition column in Hive, all partitions are scanned before the UDF is
resolved. 

If the UDF could be evaluated before query is run, this would greatly improve performance
in cases like this.

Example - the table has a partition by datestamp (bigint): 

The following where clause touches upon all 82 partitions:
{{WHERE datestamp=cast(from_unixtime(unix_timestamp(),'yyyyMMdd') as bigint)}}
{{15/03/16 09:21:53 INFO mapred.FileInputFormat: Total input paths to process : 82}}

…whereas the following only touches the one partition:
{{WHERE datestamp=20150316}}
{{15/03/16 09:23:06 INFO input.FileInputFormat: Total input paths to process : 1}}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message