pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-833) Storage access layer
Date Tue, 18 Aug 2009 04:54:15 GMT

    [ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744361#action_12744361
] 

Raghu Angadi commented on PIG-833:
----------------------------------


will try to get some initial docs attached to this jira asap. I think the current plan is
to have proper wiki pages (and attached here). This is part of the reason by we would like
to keep this jira open.

The bulk initial dump is certainly not desirable but has been fairly common for many contrib
projects in Hadoop. A bit of rush to get this committed to contrib is in part to avoid such
large changes going again. The longer we delay larger the patch is going to get. We want to
get the subsequent patches and discussions to public jira asap and we are already doing that.

I would like to clarify that this is not a PIG feature but rather a contrib project. We would
not want this commit to be generalized for PIG commits. All the responsibility is with Zebra
team. This patch is the initial verion. It does include many tests. 






> Storage access layer
> --------------------
>
>                 Key: PIG-833
>                 URL: https://issues.apache.org/jira/browse/PIG-833
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Jay Tang
>         Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, PIG-833-zebra.patch.bz2,
PIG-833-zebra.patch.bz2, TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz
>
>
> A layer is needed to provide a high level data access abstraction and a tabular view
of data in Hadoop, and could free Pig users from implementing their own data storage/retrieval
code.  This layer should also include a columnar storage format in order to provide fast data
projection, CPU/space-efficient data serialization, and a schema language to manage physical
storage metadata.  Eventually it could also support predicate pushdown for further performance
improvement.  Initially, this layer could be a contrib project in Pig and become a hadoop
subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message