hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasad Chakka (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-207) Change SerDe API to allow skipping unused columns
Date Thu, 08 Jan 2009 21:58:59 GMT

    [ https://issues.apache.org/jira/browse/HIVE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662133#action_12662133
] 

Prasad Chakka commented on HIVE-207:
------------------------------------

i thought we could pull some code from DynamicSerDe to do that, or may some thrift code. 

couple of reasons are:
1) simpler interface (imo) 
2) How do deserializers get initialized in task trackers? we need to transport ColumnInfo
objects as javabeans?

But none of these are blocking are critical, so if it is really difficult to convert thrift
DDL into TypeInfo then we can do it the way you proposed.


> Change SerDe API to allow skipping unused columns
> -------------------------------------------------
>
>                 Key: HIVE-207
>                 URL: https://issues.apache.org/jira/browse/HIVE-207
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor, Serializers/Deserializers
>            Reporter: David Phillips
>
> A deserializer shouldn't have to deserialize columns that are never used by the query
processor.  A serializer shouldn't have to examine unused columns that are known to always
be null.
> As an example, we store data as a Protocol Buffer structure with ~60 fields.  Running
a "select count(1)" currently requires deserializing all fields, which includes checking if
they exist and formatting the data appropriately.  This is expensive and unnecessary.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message