hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Phillips (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-207) Change SerDe API to allow skipping unused columns
Date Tue, 06 Jan 2009 23:31:44 GMT

    [ https://issues.apache.org/jira/browse/HIVE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661371#action_12661371
] 

David Phillips commented on HIVE-207:
-------------------------------------

Thank you for the detailed explanations.  I now have a much better understanding of SerDe's
purpose and scope.  The design of ObjectInspector also makes sense now.  To summarize:

1) SerDe, not the DDL, defines the table schema.  Some SerDe implementations use the DDL for
configuration.
2) Column types can be arbitrarily nested arrays, maps and structures.
3) The callback design of ObjectInspector allows lazy deserialization with CASE/IF or when
using complex or nested types.



> Change SerDe API to allow skipping unused columns
> -------------------------------------------------
>
>                 Key: HIVE-207
>                 URL: https://issues.apache.org/jira/browse/HIVE-207
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor, Serializers/Deserializers
>            Reporter: David Phillips
>
> A deserializer shouldn't have to deserialize columns that are never used by the query
processor.  A serializer shouldn't have to examine unused columns that are known to always
be null.
> As an example, we store data as a Protocol Buffer structure with ~60 fields.  Running
a "select count(1)" currently requires deserializing all fields, which includes checking if
they exist and formatting the data appropriately.  This is expensive and unnecessary.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message