hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hong Tang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-833) Storage access layer
Date Fri, 05 Jun 2009 00:56:07 GMT

    [ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716470#action_12716470

Hong Tang commented on PIG-833:

Jeff, just like the SQL effort, the space of columnar storage is also wide open, and I think
it is more beneficial to the overall healthy of the hadoop ecosystem.

With that being said, I also looked at the patch attached with HIVE-352. It appears that what
the patch does is a level below our stated objectives. Specifically, the guts of the implementation
(RCFile) is very close in spirit to TFile as described HADOOP-3315, which seems to have its
first comprehensive patch back in December 2008. 

> Storage access layer
> --------------------
>                 Key: PIG-833
>                 URL: https://issues.apache.org/jira/browse/PIG-833
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Jay Tang
> A layer is needed to provide a high level data access abstraction and a tabular view
of data in Hadoop, and could free Pig users from implementing their own data storage/retrieval
code.  This layer should also include a columnar storage format in order to provide fast data
projection, CPU/space-efficient data serialization, and a schema language to manage physical
storage metadata.  Eventually it could also support predicate pushdown for further performance
improvement.  Initially, this layer could be a contrib project in Pig and become a hadoop
subproject later on.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message