hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zoltan Haindrich (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-16252) Vectorization: Cannot vectorize: Aggregation Function UDF avg
Date Mon, 20 Mar 2017 15:27:41 GMT

    [ https://issues.apache.org/jira/browse/HIVE-16252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932826#comment-15932826
] 

Zoltan Haindrich commented on HIVE-16252:
-----------------------------------------

this might cause quite a few vectorization problems (I think)

{{git grep notVectorizedReason}} returned a lot of occurences of this.

there is a type filtering logic - based on the type's string at:

https://github.com/apache/hive/blob/27f27219a2b965958f850a92bf581d7b9c3ddfb0/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L2278

which rejects {{struct<...>}} - the actual value of 'type' in this case.

so generally I would say (based on what I've saw now): vectorization is currently unsupported
for any aggregator in case {{group by}} is being used.

> Vectorization: Cannot vectorize: Aggregation Function UDF avg 
> --------------------------------------------------------------
>
>                 Key: HIVE-16252
>                 URL: https://issues.apache.org/jira/browse/HIVE-16252
>             Project: Hive
>          Issue Type: Bug
>          Components: Vectorization
>            Reporter: Rajesh Balamohan
>
> {noformat}
> select 
>         ss_store_sk, ss_item_sk, avg(ss_sales_price) as revenue
>     from
>         store_sales, date_dim
>     where
>         ss_sold_date_sk = d_date_sk
>             and d_month_seq between 1212 and 1212 + 11
>     group by ss_store_sk , ss_item_sk limit 100;
> 2017-03-20T00:59:49,526  INFO [680a4c08-1639-4bb9-8d6a-0bf0f30ef563 main] physical.Vectorizer:
Validating ReduceWork...
> 2017-03-20T00:59:49,526 DEBUG [680a4c08-1639-4bb9-8d6a-0bf0f30ef563 main] physical.Vectorizer:
Using reduce tag 0
> 2017-03-20T00:59:49,527 DEBUG [680a4c08-1639-4bb9-8d6a-0bf0f30ef563 main] lazybinary.LazyBinarySerDe:
LazyBinarySerDe initialized with: columnNames=[_col0] columnTypes=[struct<count:bigint,sum:double,input:double>]
> 2017-03-20T00:59:49,527 DEBUG [680a4c08-1639-4bb9-8d6a-0bf0f30ef563 main] vector.VectorizationContext:
Input Expression = Column[KEY._col0], Vectorized Expression = col 0
> ...
> ...
> 2017-03-20T00:59:49,528  INFO [680a4c08-1639-4bb9-8d6a-0bf0f30ef563 main] physical.Vectorizer:
Cannot vectorize: Aggregation Function UDF avg parameter expression for GROUPBY operator:
Data type struct<count:bigint,sum:double,input:double> of Column[VALUE._col0] not supported
> {noformat}
> Env: Hive build from: commit 71f4930d95475e7e63b5acc55af3809aefcc71e0 (march 16)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message