hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-352) Make Hive support column based storage
Date Fri, 17 Apr 2009 09:29:15 GMT

    [ https://issues.apache.org/jira/browse/HIVE-352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700098#action_12700098
] 

Zheng Shao commented on HIVE-352:
---------------------------------

hive-352-2009-4-17.patch:

Very nice job!

2 more tests to add:
1. Big data test. Take a look at ql/src/test/queries/clientpositive/groupby_bigdata.q to see
how we generate big data sets.
2. Complex column types: Take a look at ./ql/src/test/queries/clientpositive/input_lazyserde.q

Some other improvements:
1. ObjectInspectorFactory.getColumnarStructObjectInspector: I think you don't need byte separator
and boolean lastColumnTakesRest. Just remove them.
2. ColumnarStruct.init: Can you cache/reuse the ByteArrayRef() instead of doing ByteArrayRef
br = new ByteArrayRef() every time? The assumption in Hive is that data is already owned by
creator, and whoever wants to keep the data for later use needs to get a deep copy of the
Object by calling ObjectInspectorUtils.copyToStandardObject.
3. ColumnarStruct: comments should mention the difference against LazyStruct is that it reads
data through init(BytesRefArrayWritable cols).
4. Can you put all changes to serde2.lazy package into a new package called serde2.columnar?
5. It seems there are a lot of shared code between LazySimpleSerDe and ColumnarSerDe, e.g.
a lot of functionalities in init and serialize. Can you refactor LazySimpleSerde and put those
common functionalities into public static methods, so that ColumnarSerDe can directly call?
You might also want to put the configurations of the LazySimpleSerDe (nullString, separators,
etc) into a public static Class, so that the public static methods will return it.


> Make Hive support column based storage
> --------------------------------------
>
>                 Key: HIVE-352
>                 URL: https://issues.apache.org/jira/browse/HIVE-352
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: He Yongqiang
>         Attachments: hive-352-2009-4-15.patch, hive-352-2009-4-16.patch, hive-352-2009-4-17.patch,
HIve-352-draft-2009-03-28.patch, Hive-352-draft-2009-03-30.patch
>
>
> column based storage has been proven a better storage layout for OLAP. 
> Hive does a great job on raw row oriented storage. In this issue, we will enhance hive
to support column based storage. 
> Acctually we have done some work on column based storage on top of hdfs, i think it will
need some review and refactoring to port it to Hive.
> Any thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message