hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pete Wyckoff (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4550) Make DynamicSerDe capable of skipping fields that will not be used in the query
Date Thu, 30 Oct 2008 18:26:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644065#action_12644065
] 

Pete Wyckoff commented on HADOOP-4550:
--------------------------------------

I propose 

1. we add a 'skip' attribute to the field specification in the dynamicserde grammar. When
this field attribute is set, DynamicSerDeFieldList will call protocol.skip for that field.
 
2. We add an interface for protocols, something like: TFastSkippable { void skip(type); }
or maybe need skipI32, skipi64, skipString, skipList, ... 
3. for TCTLSeparatedProtocol, we implement TFastSkippable
4. Modify the runtime to insert skip attributes in the runtime DDL passed to DynamicSerDe.

This will need to be prioritized with other optimizations, but for TCTLSeparatedProtocol this
is certainly a performance issue and may block replacing TMetadataTypedColumnsetSerDe with
DynamicSerDe since the latter is only strings and cost of not skipping is low.


> Make DynamicSerDe capable of skipping fields that will not be used in the query
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-4550
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4550
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: contrib/hive
>            Reporter: Pete Wyckoff
>
> Thrift/DynamicSerDe always deseriualize and convert fields to the correct type for every
field in the record. Many times, only a few of the fields will be used.
> e.g., select foo.user from foo where foo.created < 'today'
> where foo is something like
> struct {
>   string user
>    i64 created
>    string fullname
>    string description
>     i32 something
>     i32 somethingelse
>    ...
> }
> Parsing fullname, description, something and something else is a waste in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message