hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-823) Hadoop Metadata Service
Date Fri, 05 Jun 2009 17:27:07 GMT

    [ https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716679#action_12716679
] 

Alan Gates commented on PIG-823:
--------------------------------

By lower level of metadata, we don't mean storing information already present in the namenode.
 The difference is in the model perspective.  Hive's metadata model consists of tables and
partitions, which is appropriate since it works with SQL which presents a relational view
to users.  Our proposal is to construct a metadata service that models directories and files.
 Map Reduce and Pig Latin present a file based view to users, and thus this model is more
appropriate for those tools.

I met a couple of times with the Facebook team to discuss metadata, and our desire to have
a hierarchical model.  They agreed that this did not fit with the model they were using. 
We both agreed that any metadata service built around the files should have an interface that
their metadata service can easily connect to, so that if a user wishes to use both they can
do so without needing to register metadata in both.

As for documentation, we're working on getting ready for external release.  We hope to post
it in the next week or so.


> Hadoop Metadata Service
> -----------------------
>
>                 Key: PIG-823
>                 URL: https://issues.apache.org/jira/browse/PIG-823
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Olga Natkovich
>
> This JIRA is created to track development of a metadata system for  Hadoop. The goal
of the system is to allow users and applications to register data stored on HDFS, search for
the data available on HDFS, and associate metadata such as schema, statistics, etc. with a
particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly
generic, low level abstraction that any user or application on HDFS can use to store an retrieve
metadata. Over time a higher level abstractions closely tied to particular applications or
tools can be developed.
> Over time, it would make sense for the metadata service to become a subproject within
Hadoop. For now, the proposal is to make it a contrib to Pig since Pig SQL is likely to be
the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message