falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pallavi Rao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-1096) Scan Hive Metastore to automatically create Falcon feeds for existing Hive tables
Date Mon, 16 Mar 2015 05:39:38 GMT

    [ https://issues.apache.org/jira/browse/FALCON-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362769#comment-14362769
] 

Pallavi Rao commented on FALCON-1096:
-------------------------------------

+1
Falcon can have a utility script to do that, with a whitelist/blacklist of Hive tables. And,
the searchable fields can be specified as tags in the feed.

> Scan Hive Metastore to automatically create Falcon feeds for existing Hive tables
> ---------------------------------------------------------------------------------
>
>                 Key: FALCON-1096
>                 URL: https://issues.apache.org/jira/browse/FALCON-1096
>             Project: Falcon
>          Issue Type: New Feature
>            Reporter: Adam Kawa
>
> In my organisation we create a Hive table for each production dataset in HDFS. When creating
a Hive table, you supply a lot of information about your dataset: its name, fields and their
types and comments, the location, the data format, properties in form of the key-value pairs
and meaningful description of the dataset. We think of Hive as a central and nicely documented
repository of our datasets.
> When using Falcon, we again need to create Falcon feed for each dataset (that corresponds
to a Hive table) and even specify multiple redundant properties (e.g. description).
> To make it simpler, Falcon could scan the Hive Metastore and automatically create feeds
for each Hive table and inherit its properties.
> The properties of Hive tables could be also used when searching for a dataset using new
Falcon Web UI e.g. field name, field comment, file format (some other statistics like total
file size, the last modification or access time could be also used).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message