falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suhas Vasu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-1096) Scan Hive Metastore to automatically create Falcon feeds for existing Hive tables
Date Mon, 16 Mar 2015 09:08:38 GMT

    [ https://issues.apache.org/jira/browse/FALCON-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362952#comment-14362952

Suhas Vasu commented on FALCON-1096:

As per FALCON-703, we are introducing a plugin which listens to JMSmessages and registers
the partitions on HCatalog table.

I prefer to have another plugin which solves this purpose, so that it can be enabled by users
who really need that feature.

> Scan Hive Metastore to automatically create Falcon feeds for existing Hive tables
> ---------------------------------------------------------------------------------
>                 Key: FALCON-1096
>                 URL: https://issues.apache.org/jira/browse/FALCON-1096
>             Project: Falcon
>          Issue Type: New Feature
>            Reporter: Adam Kawa
> In my organisation we create a Hive table for each production dataset in HDFS. When creating
a Hive table, you supply a lot of information about your dataset: its name, fields and their
types and comments, the location, the data format, properties in form of the key-value pairs
and meaningful description of the dataset. We think of Hive as a central and nicely documented
repository of our datasets.
> When using Falcon, we again need to create Falcon feed for each dataset (that corresponds
to a Hive table) and even specify multiple redundant properties (e.g. description).
> To make it simpler, Falcon could scan the Hive Metastore and automatically create feeds
for each Hive table and inherit its properties.
> The properties of Hive tables could be also used when searching for a dataset using new
Falcon Web UI e.g. field name, field comment, file format (some other statistics like total
file size, the last modification or access time could be also used).

This message was sent by Atlassian JIRA

View raw message