Return-Path: Delivered-To: apmail-hadoop-hive-dev-archive@minotaur.apache.org Received: (qmail 12232 invoked from network); 13 May 2009 06:55:08 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 13 May 2009 06:55:08 -0000 Received: (qmail 81569 invoked by uid 500); 13 May 2009 06:55:08 -0000 Delivered-To: apmail-hadoop-hive-dev-archive@hadoop.apache.org Received: (qmail 81388 invoked by uid 500); 13 May 2009 06:55:08 -0000 Mailing-List: contact hive-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hive-dev@hadoop.apache.org Delivered-To: mailing list hive-dev@hadoop.apache.org Received: (qmail 81367 invoked by uid 99); 13 May 2009 06:55:07 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 May 2009 06:55:07 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 May 2009 06:55:05 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id AD2A1234C041 for ; Tue, 12 May 2009 23:54:45 -0700 (PDT) Message-ID: <1148934908.1242197685708.JavaMail.jira@brutus> Date: Tue, 12 May 2009 23:54:45 -0700 (PDT) From: "He Yongqiang (JIRA)" To: hive-dev@hadoop.apache.org Subject: [jira] Commented: (HIVE-477) Some optimization thoughts for Hive In-Reply-To: <2070765305.1241694330309.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HIVE-477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708796#action_12708796 ] He Yongqiang commented on HIVE-477: ----------------------------------- One Comment for 1): Avoiding byte copy when init LazyString seems will not save CPU time. In my test, i use two tables of 30 1K columns, and insert one from the other. The table's size is about 140M. Two tests, one with byte copy and the other without byte copy, cost the same time. So it seems java's array copy time can be ignored. > Some optimization thoughts for Hive > ----------------------------------- > > Key: HIVE-477 > URL: https://issues.apache.org/jira/browse/HIVE-477 > Project: Hadoop Hive > Issue Type: Improvement > Reporter: He Yongqiang > > Before we can start working on Hive-461. I am doing some profiling for hive. And here are some thoughts for improvements: > minor : > 1) add a new HiveText to replace Text. It can avoid byte copy when init LazyString. I have done a draft one, it shows ~1% performance gains. > 2) let StructObjectInspector's > {noformat} > public List getStructFieldsDataAsList(Object data); > {noformat} > to be > {noformat} > public Object[] getStructFieldsDataAsArray(Object data); > {noformat} > In my profiling test, it shows some performace gains. but in acutal execution it did not. Anyway, let it return java array will reduce gc's burden of collection ArrayList > not so minor: > 3) split FileSinkOperator's Writer into another Thread. Adding a producer-consumer array as the bridge between the Operators thread and the Writer thread. > 4) the operator stack is kind of deep. In order to avoid instruction cache misses, and increase the efficiency data cache, I suggest to let Hive's operator can process an array of rows instead of processing only one row at a time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.