hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <>
Subject [jira] Commented: (HIVE-352) Make Hive support column based storage
Date Thu, 19 Mar 2009 07:53:50 GMT


Zheng Shao commented on HIVE-352:

Let's do B2.2 first. I guess there will need to be some interface change to make it possible
(SerDe now only deserializes one row out of one Writable, while we are looking for multiple
rows per Writable). We can use the sequencefile compression support transparently.

Once B2.2 is done, we can move to B2.1. As Joydeep said, we may need to extend SequenceFile
to make split work. At the same time we might want to use SequenceFile record-compression
(instead of SequenceFile block-compression) if we can make relatively big records. That will
save us the time of decompressing unnecessary columns. Or we can disable SequenceFile compression,
and compress record by record by ourselves. As Joydeep said, we will have to decide whether
we want to open a big number of codecs at the same time, or buffer all uncompressed data and
compress one column by one column when writing out. BZip2Codec needs 100KB to 900KB per compression

> Make Hive support column based storage
> --------------------------------------
>                 Key: HIVE-352
>                 URL:
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: He Yongqiang
> column based storage has been proven a better storage layout for OLAP. 
> Hive does a great job on raw row oriented storage. In this issue, we will enhance hive
to support column based storage. 
> Acctually we have done some work on column based storage on top of hdfs, i think it will
need some review and refactoring to port it to Hive.
> Any thoughts?

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message