drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph Barefoot (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-3588) Write back to Hive Metastore
Date Fri, 31 Jul 2015 17:35:04 GMT
Joseph Barefoot created DRILL-3588:

             Summary: Write back to Hive Metastore
                 Key: DRILL-3588
                 URL: https://issues.apache.org/jira/browse/DRILL-3588
             Project: Apache Drill
          Issue Type: Improvement
            Reporter: Joseph Barefoot
            Priority: Critical

This feature is particularly important to us here at AtScale in order to leverage Drill as
a query engine option for our BI on Hadoop solution. Currently you can connect to and query
databases/tables from Hive Metastore fine. However if you create a table, it will be created
in HDFS but no metadata is written to the Hive Metastore. That means those tables won't be
easily visible to any other tool. 

When you read schemas from a Hive datasource via Drill, they are prefixed with "hive.". This
namespacing makes sense to us considering how Drill works, and ideally it would work symmetrically
when you create tables with the same prefix, i.e. Drill would map the prefix to the target
data source, in this case Hive, and write the schema information back to the Hive MetaStore.
Our specific use case is Create Table As Select, however ideally any DDL statements against
a hive datasource schema/table would write back to the Hive Metastore. 

The reason it's important to have the metadata in Hive Metastore is we have found many of
our customers use multiple SQL tools to access data tracked in the Metastore. For example,
even if Impala is their primary SQL on Hadoop engine for clients/tools, they may run Spark
jobs to manipulate data via RDDs that pull data by referencing the Metastore. Organizations
using a lot of SQL on Hadoop have come to expect this sort of interoperability between Hive,
Spark, and Impala, and supporting it within Drill will help drive adoption within the Hadoop
community (besides making it a lot easier for us to use Drill effectively from within our
BI engine).

This message was sent by Atlassian JIRA

View raw message