Return-Path: Delivered-To: apmail-hadoop-hive-dev-archive@locus.apache.org Received: (qmail 83674 invoked from network); 5 Jan 2009 23:34:07 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 5 Jan 2009 23:34:07 -0000 Received: (qmail 98705 invoked by uid 500); 5 Jan 2009 23:34:07 -0000 Delivered-To: apmail-hadoop-hive-dev-archive@hadoop.apache.org Received: (qmail 98680 invoked by uid 500); 5 Jan 2009 23:34:07 -0000 Mailing-List: contact hive-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hive-dev@hadoop.apache.org Delivered-To: mailing list hive-dev@hadoop.apache.org Received: (qmail 98660 invoked by uid 99); 5 Jan 2009 23:34:07 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Jan 2009 15:34:07 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Jan 2009 23:34:05 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 4C81F234C48C for ; Mon, 5 Jan 2009 15:33:44 -0800 (PST) Message-ID: <799505973.1231198424312.JavaMail.jira@brutus> Date: Mon, 5 Jan 2009 15:33:44 -0800 (PST) From: "Joydeep Sen Sarma (JIRA)" To: hive-dev@hadoop.apache.org Subject: [jira] Commented: (HIVE-207) Change SerDe API to allow skipping unused columns In-Reply-To: <233026567.1231176104146.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HIVE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660987#action_12660987 ] Joydeep Sen Sarma commented on HIVE-207: ---------------------------------------- @Zheng - can we get lazy deserialization using Dynamic Serde without changing the Dynamic Serde code (and writing a new protocol only?)? alternatively - we could ignore the whole DDL thing right now and create a table with a custom serde for protocol buffers and put the schema information in the serde properties (which the create table command should support). Instead of forcing people to use dynamic serde (when they want to use DDL) - one extensibility hook we can add is to generate serde configuration from the parsed DDL information using a callback. Perhaps this can be an optional method in the SerDe. That way - people can add Hive DDL to Protocol Buffer configuration (for example) without having to use Dynamic Serde. > Change SerDe API to allow skipping unused columns > ------------------------------------------------- > > Key: HIVE-207 > URL: https://issues.apache.org/jira/browse/HIVE-207 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor, Serializers/Deserializers > Reporter: David Phillips > > A deserializer shouldn't have to deserialize columns that are never used by the query processor. A serializer shouldn't have to examine unused columns that are known to always be null. > As an example, we store data as a Protocol Buffer structure with ~60 fields. Running a "select count(1)" currently requires deserializing all fields, which includes checking if they exist and formatting the data appropriately. This is expensive and unnecessary. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.