hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasanth J (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-8188) ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop
Date Fri, 19 Sep 2014 07:55:33 GMT

    [ https://issues.apache.org/jira/browse/HIVE-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140154#comment-14140154
] 

Prasanth J commented on HIVE-8188:
----------------------------------

I think its because hash-aggregation needs to estimate the size of the hash map. The values
of the hashmaps are UDAFs whose aggregation buffer size can be estimated if the aggregation
buffer has this annotation "@AggregationType(estimable = true)". GroupByOperator.shouldBeFlushed()
is called for every row that is added to hash map. shouldBeFlushed() calls isEstimable() helper
function which uses reflection every time to see if the aggregation function is estimable.
Not sure why it is done this way but yes this will be slow as hell. This needs to be fixed.

> ExprNodeGenericFuncEvaluator::_evaluate() loads class annotations in a tight loop
> ---------------------------------------------------------------------------------
>
>                 Key: HIVE-8188
>                 URL: https://issues.apache.org/jira/browse/HIVE-8188
>             Project: Hive
>          Issue Type: Bug
>          Components: UDF
>    Affects Versions: 0.14.0
>            Reporter: Gopal V
>         Attachments: udf-deterministic.png
>
>
> When running a near-constant UDF, most of the CPU is burnt within the VM trying to read
the class annotations for every row.
> !udf-deterministic.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message