hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Krishna Kumar (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-2600) Enable/Add type-specific compression for rcfile
Date Mon, 05 Dec 2011 10:07:40 GMT

    [ https://issues.apache.org/jira/browse/HIVE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162705#comment-13162705
] 

Krishna Kumar commented on HIVE-2600:
-------------------------------------

He Yongqiang,

  In UberCompressor, I have gone with the decision that the mechanism can change on a per-block
basis. So the file as a whole will declare the compression codec as "UberCompressionCodec"
in the file header, and each column block will indicate the mechanism used for that block.

Carl,

  ColumnarSerde serializes all types as strings. Other Serdes can serialize the bytes as they
wish. With 2604, I have added a dummy serde called UberCompressorSerde which is used to serialize
the objects into bytes (BytesRefArrayWritable), the codec then can recover the objects which
can then be fed to type-specific compressors. 

Just to make sure we are on the same page, please note that both the above points relate to
a specific implementation of a schema-aware compressor. This jira in itself only introduces
the interface, and the invocations of that interface. I'd like move any threads of implementation
discussions to HIVE-2604.

                
> Enable/Add type-specific compression for rcfile
> -----------------------------------------------
>
>                 Key: HIVE-2600
>                 URL: https://issues.apache.org/jira/browse/HIVE-2600
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Query Processor, Serializers/Deserializers
>            Reporter: Krishna Kumar
>            Assignee: Krishna Kumar
>            Priority: Minor
>         Attachments: HIVE-2600.v0.patch, HIVE-2600.v1.patch
>
>
> Enable schema-aware compression codecs which can perform type-specific compression on
a per-column basis. I see this as in three-parts
> 1. Add interfaces for the rcfile to communicate column information to the codec
> 2. Add an "uber compressor" which can perform column-specific compression on a per-block
basis. Initially, this can be config driven, but we can go for a dynamic implementation later.
> 3. A bunch of type-specific compressors
> This jira is for the first part of the effort.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message