hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-352) Make Hive support column based storage
Date Thu, 30 Apr 2009 19:54:30 GMT

    [ https://issues.apache.org/jira/browse/HIVE-352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704765#action_12704765
] 

Zheng Shao commented on HIVE-352:
---------------------------------

>>>>Writer: how do you pass the column number from Hive to the configuration and
then to the RCFIle.Writer?
>>The code is in RCFileOutputFormat's getHiveRecordWriter(). It tries to parse the columns
from passed in Properties.
Thanks. I understand it now.

>>>>init(...): Cleaning out the object and recreate LazyObject is not efficient.
>>If we change it, it will not pass the TestRCFile test. The final extra else if statements
are rarely reached, and when reached, most time it only needs one instruction to determine
whether fields[fieldIndex] is null.

Can you add a boolean[] fieldIsNull to mark whether a field is null, instead of throwing away
and recreating the LazyObject?
Then getField can check fieldIsNull to decide whether to return null or the LazyObject.


> Make Hive support column based storage
> --------------------------------------
>
>                 Key: HIVE-352
>                 URL: https://issues.apache.org/jira/browse/HIVE-352
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>         Attachments: 4-22 performace2.txt, 4-22 performance.txt, 4-22 progress.txt, hive-352-2009-4-15.patch,
hive-352-2009-4-16.patch, hive-352-2009-4-17.patch, hive-352-2009-4-19.patch, hive-352-2009-4-22-2.patch,
hive-352-2009-4-22.patch, hive-352-2009-4-23.patch, hive-352-2009-4-27.patch, hive-352-2009-4-30-2.patch,
hive-352-2009-4-30-3.patch, hive-352-2009-4-30-4.patch, hive-352-2009-5-1.patch, HIve-352-draft-2009-03-28.patch,
Hive-352-draft-2009-03-30.patch
>
>
> column based storage has been proven a better storage layout for OLAP. 
> Hive does a great job on raw row oriented storage. In this issue, we will enhance hive
to support column based storage. 
> Acctually we have done some work on column based storage on top of hdfs, i think it will
need some review and refactoring to port it to Hive.
> Any thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message