hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-7873) Re-enable lazy HiveBaseFunctionResultList
Date Mon, 13 Oct 2014 19:51:34 GMT

    [ https://issues.apache.org/jira/browse/HIVE-7873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169837#comment-14169837
] 

Xuefu Zhang commented on HIVE-7873:
-----------------------------------

Re: explanation for the numbers

Higher numbers are all caused by dorminant disk access. #1 causes data spill because data
is not emitted until close() call. The spill happens outside RowContainer, though. 

When there is no lazy exec, all rows are produced before it can be consumed. Thus, all numbers
are high. The same theory applied to lazy exec with disk spill ( #3 and #5 in the second set).


The true synchronization cost is the diff between #2 and #4 in the second set, which seems
acceptable.

> Re-enable lazy HiveBaseFunctionResultList
> -----------------------------------------
>
>                 Key: HIVE-7873
>                 URL: https://issues.apache.org/jira/browse/HIVE-7873
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Brock Noland
>            Assignee: Jimmy Xiang
>              Labels: Spark-M4, spark
>         Attachments: HIVE-7873.1-spark.patch, HIVE-7873.2-spark.patch
>
>
> We removed this optimization in HIVE-7799.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message