hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-9452) Use HBase to store Hive metadata
Date Mon, 09 Feb 2015 23:51:35 GMT

    [ https://issues.apache.org/jira/browse/HIVE-9452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313187#comment-14313187
] 

Alan Gates commented on HIVE-9452:
----------------------------------

[~daisy_yu] Take a look at the attached document, which hopefully will answer most of your
questions.  De/serialization of tables with thousands of columns will be improved in the case
of partitioned tables, since it will only need to be done once (assuming the partitions have
the same schema, etc.), but won't be significantly changed in the case of non-partitioned
tables.

I do not believe dropping tables will cause an HBase split.

> Use HBase to store Hive metadata
> --------------------------------
>
>                 Key: HIVE-9452
>                 URL: https://issues.apache.org/jira/browse/HIVE-9452
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>         Attachments: HBaseMetastoreApproach.pdf
>
>
> This is an umbrella JIRA for a project to explore using HBase to store the Hive data
catalog (ie the metastore).  This project has several goals:
> # The current metastore implementation is slow when tables have thousands or more partitions.
 With Tez and Spark engines we are pushing Hive to a point where queries only take a few seconds
to run.  But planning the query can take as long as running it.  Much of this time is spent
in metadata operations.
> # Due to scale limitations we have never allowed tasks to communicate directly with the
metastore.  However, with the development of LLAP this requirement will have to be relaxed.
 If we can relax this there are other use cases that could benefit from this.  
> # Eating our own dogfood.  Rather than using external systems to store our metadata there
are benefits to using other components in the Hadoop system.
> The proposal is to create a new branch and work on the prototype there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message