Return-Path: Delivered-To: apmail-hadoop-hive-dev-archive@minotaur.apache.org Received: (qmail 73647 invoked from network); 27 Apr 2009 11:51:53 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 27 Apr 2009 11:51:53 -0000 Received: (qmail 51499 invoked by uid 500); 27 Apr 2009 11:51:53 -0000 Delivered-To: apmail-hadoop-hive-dev-archive@hadoop.apache.org Received: (qmail 51466 invoked by uid 500); 27 Apr 2009 11:51:53 -0000 Mailing-List: contact hive-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hive-dev@hadoop.apache.org Delivered-To: mailing list hive-dev@hadoop.apache.org Received: (qmail 51456 invoked by uid 99); 27 Apr 2009 11:51:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Apr 2009 11:51:53 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Apr 2009 11:51:51 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 646E7234C003 for ; Mon, 27 Apr 2009 04:51:30 -0700 (PDT) Message-ID: <1570027331.1240833090395.JavaMail.jira@brutus> Date: Mon, 27 Apr 2009 04:51:30 -0700 (PDT) From: "He Yongqiang (JIRA)" To: hive-dev@hadoop.apache.org Subject: [jira] Updated: (HIVE-352) Make Hive support column based storage In-Reply-To: <1379209356.1237274210500.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HIVE-352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-352: ------------------------------ Attachment: hive-352-2009-4-27.patch hive-352-2009-4-27.patch changed back to bulk compression and now also compress the key part. Here is a result on TPCH's lineitem: Direct(incremental) compression, and does not compress key part: 274982705 hdfs://10.61.0.160:9000/user/hdfs/tpch1G_rc First Buffered then compress(Bulk Compression), and compress key part: 188401365 hdfs://10.61.0.160:9000/user/hdfs/tpch1G_newRC BTW, I also tried to implement direct(incremental) compression, and tried to decompress a value buffer's columns part by part. But at the last step( when implementing ValueBuffer's readFields), i noticed that it is not very easy to implement it. Because we only hold on InputStream to the underlying file, and we need to seek back and forth to decompress part of each columns, and also we need to hold one decompress stream for each column. If we seek the inputstream, the decompress stream is corrupt. To avoid all these, we need to read all needed columns' compressed data into memory, and do in memory decompress. But we stil need one decompress stream for each column. I stop implementing this at the last step, if it is needed i can finish it. > Make Hive support column based storage > -------------------------------------- > > Key: HIVE-352 > URL: https://issues.apache.org/jira/browse/HIVE-352 > Project: Hadoop Hive > Issue Type: New Feature > Reporter: He Yongqiang > Assignee: He Yongqiang > Attachments: 4-22 performace2.txt, 4-22 performance.txt, 4-22 progress.txt, hive-352-2009-4-15.patch, hive-352-2009-4-16.patch, hive-352-2009-4-17.patch, hive-352-2009-4-19.patch, hive-352-2009-4-22-2.patch, hive-352-2009-4-22.patch, hive-352-2009-4-23.patch, hive-352-2009-4-27.patch, HIve-352-draft-2009-03-28.patch, Hive-352-draft-2009-03-30.patch > > > column based storage has been proven a better storage layout for OLAP. > Hive does a great job on raw row oriented storage. In this issue, we will enhance hive to support column based storage. > Acctually we have done some work on column based storage on top of hdfs, i think it will need some review and refactoring to port it to Hive. > Any thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.