hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Hanson (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HIVE-4701) Optimize filter Column IN ( list-of-constants ) for vectorized execution
Date Fri, 01 Nov 2013 23:46:17 GMT

     [ https://issues.apache.org/jira/browse/HIVE-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eric Hanson resolved HIVE-4701.
-------------------------------

    Resolution: Duplicate

Duplicate of HIVE-5583

> Optimize filter Column IN ( list-of-constants ) for vectorized execution
> ------------------------------------------------------------------------
>
>                 Key: HIVE-4701
>                 URL: https://issues.apache.org/jira/browse/HIVE-4701
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Eric Hanson
>
> OR filters have been optimized to run with vectorized query execution. IN filters of
the form "Column IN (list-of-constants)" are a special case of OR. However, IN does not vectorized
currently.
> E.g. 
> select ddate, count\(\*\) from factsqlengineam_vec_orc where ddate = "2012-05-19 00:00:00"
OR ddate = "2012-05-20 00:00:00" or ddate = "2012-05-21 00:00:00" group by ddate;
> takes about 23 seconds of CPU and 
> select ddate, count\(\*\) from factsqlengineam_vec_orc where ddate IN ("2012-05-19 00:00:00",
"2012-05-20 00:00:00", "2012-05-21 00:00:00") group by ddate;
> takes about 153 seconds of CPU.
> A simple fix may be that for short IN lists (say <= 64 elements) we turn them into
OR by manipulating the query tree before planning whether vectorization can be used.
> A more complex fix that covers more cases would be to turn longer IN lists into a join
so when we eventually support vectorized joins it will be fast.
> An intermediate approach might be to implement a special IN filter operator that stores
the constant values in a sorted array or high-performance hash table (like Cuckoo hashing).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message