Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B266A17F7F for ; Mon, 13 Oct 2014 18:32:35 +0000 (UTC) Received: (qmail 4666 invoked by uid 500); 13 Oct 2014 18:32:34 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 4597 invoked by uid 500); 13 Oct 2014 18:32:34 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 4585 invoked by uid 500); 13 Oct 2014 18:32:34 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 4581 invoked by uid 99); 13 Oct 2014 18:32:34 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Oct 2014 18:32:34 +0000 Date: Mon, 13 Oct 2014 18:32:34 +0000 (UTC) From: "Jimmy Xiang (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-7873) Re-enable lazy HiveBaseFunctionResultList MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-7873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-7873: ------------------------------ Attachment: HIVE-7873.1-spark.patch Attached a patch that re-enabled lazy HiveBaseFunctionResultList. A separate RowContainer is used to work around the no-write-after-read limitation of RowContainer. The patch also fixed a concurrency issue in HiveKVResultCache. Synchronized is used instead of reentrant lock since I assume there won't be many threads to access the cache. Based on my test, the synchronization doesn't have noticeable overhead if there is no other thread. If each processNextRecord() call doesn't dump too many records to the cache, lazy result list have very good performance. However, if each processNextRecord() call dumps much more records than the cache can hold in memory, the performance gets worse. > Re-enable lazy HiveBaseFunctionResultList > ----------------------------------------- > > Key: HIVE-7873 > URL: https://issues.apache.org/jira/browse/HIVE-7873 > Project: Hive > Issue Type: Sub-task > Reporter: Brock Noland > Assignee: Jimmy Xiang > Labels: Spark-M4, spark > Attachments: HIVE-7873.1-spark.patch > > > We removed this optimization in HIVE-7799. -- This message was sent by Atlassian JIRA (v6.3.4#6332)