hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-207) Change SerDe API to allow skipping unused columns
Date Wed, 07 Jan 2009 03:30:44 GMT

    [ https://issues.apache.org/jira/browse/HIVE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661424#action_12661424

Zheng Shao commented on HIVE-207:

@Joydeep - Yes DynamicSerDe is just a parser for thrift DDL. It calls the respective methods
of the protocol for each field. As a result, it's possible to write a protocol without a modifying
the Dynamic SerDe code. I don't get your idea of the extensibility hook. I guess the hook
is just the same as DynamicSerDe and its Protocols?

> Change SerDe API to allow skipping unused columns
> -------------------------------------------------
>                 Key: HIVE-207
>                 URL: https://issues.apache.org/jira/browse/HIVE-207
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor, Serializers/Deserializers
>            Reporter: David Phillips
> A deserializer shouldn't have to deserialize columns that are never used by the query
processor.  A serializer shouldn't have to examine unused columns that are known to always
be null.
> As an example, we store data as a Protocol Buffer structure with ~60 fields.  Running
a "select count(1)" currently requires deserializing all fields, which includes checking if
they exist and formatting the data appropriately.  This is expensive and unnecessary.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message