hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jimmy Xiang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-7873) Re-enable lazy HiveBaseFunctionResultList
Date Mon, 13 Oct 2014 18:32:34 GMT

     [ https://issues.apache.org/jira/browse/HIVE-7873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jimmy Xiang updated HIVE-7873:
------------------------------
    Attachment: HIVE-7873.1-spark.patch

Attached a patch that re-enabled lazy HiveBaseFunctionResultList. A separate RowContainer
is used to work around the no-write-after-read limitation of RowContainer. The patch also
fixed a concurrency issue in HiveKVResultCache. Synchronized is used instead of reentrant
lock since I assume there won't be many threads to access the cache.

Based on my test, the synchronization doesn't have noticeable overhead if there is no other
thread. If each processNextRecord() call doesn't dump too many records to the cache, lazy
result list have very good performance. However, if each processNextRecord() call dumps much
more records than the cache can hold in memory, the performance gets worse.

> Re-enable lazy HiveBaseFunctionResultList
> -----------------------------------------
>
>                 Key: HIVE-7873
>                 URL: https://issues.apache.org/jira/browse/HIVE-7873
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Brock Noland
>            Assignee: Jimmy Xiang
>              Labels: Spark-M4, spark
>         Attachments: HIVE-7873.1-spark.patch
>
>
> We removed this optimization in HIVE-7799.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message