incubator-hcatalog-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devaraj Das <d...@yahoo-inc.com>
Subject Re: HCatalog feature wish list
Date Fri, 29 Apr 2011 18:03:26 GMT
Hi Bill,
Responses inline.
Devaraj.


On 4/29/11 9:39 AM, "Bill Graham" <billgraham@gmail.com> wrote:

hi,

We're starting to look into HCatalog to see if it can help us organize
and catalog our data/schemas owned by various groups across our
organization. As a result I have a few questions about some
functionality that I don't think exists yet, but I could be mistaken
(it's been a while since I last worked with Hive).

If these aren't currently supported, would there be interest in
including these features in the roadmap? If so we might be able to
contribute resources to help implement some of them.


- Custom table and field metadata
Is it possible to annotate a table or columns with custom key/value
metadata (i.e. table POCs, descriptions, column data formats, etc)?
This Howl wiki (http://wiki.apache.org/pig/owl) had a one-liner about
custom table metadata, but that's the only reference I've been able to
find about this.

Devaraj: Yes, tables can be annotated, with properties that look like key/value pairs.

- Support for non-RDMS metadata storage
We have groups that currently store a custom schema definition file in
HDFS along side of their actual data. Would it be possible to direct
HCatalog to consult this file instead of the DB for the schema info
for this class of tables?

Devaraj: HCatalog will soon have a tool that can import data schema to the metadata store.
At Yahoo!, the plan is to use this tool to migrate all old data. All new data should be directly
registered in HCatalog by the data producers. Would this suffice?

- Web UI
Are there any know web UIs (or plans for one) to expose and even
modify HCatalog data? We'd like to build a web UI that would help with
data discovery. IIRC, facebook had something similar at one point.

Devaraj: At Yahoo!, there is some internal effort going on in this direction.

- Support for columnar DBs (i.e., HBase)
I know this doesn't exist currently, but is this something being
considered or requested? HCatalog is focused on fixed-width schemas,
so this would be tricky to represent, but it seems worth exploring.
There seems to be an emerging need to effectively manage and
understand the schemas of such schema-less data stores. :)

Devaraj: HCatalog will support HBase schemas at some point. Again, this is something that
Yahoo! has started looking at internally.

thanks,
Bill


Mime
View raw message