hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vihang Karajgaonkar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-16771) Schematool should use MetastoreSchemaInfo to get the metastore schema version from database
Date Sat, 27 May 2017 19:15:04 GMT

    [ https://issues.apache.org/jira/browse/HIVE-16771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16027546#comment-16027546
] 

Vihang Karajgaonkar commented on HIVE-16771:
--------------------------------------------

The test failure {{udtf_replicate_rows}} is unrelated and working for me when I run it locally.
I ran it twice and it succeeded both the times. I am attaching one more version with the changes
described below. I am hoping that the next run will succeed for that test.

The new version of the patch closes the connection object from getMetastoreSchemaVersion method
implementation.

Hi [~ngangam] I agree that the interface method should ideally just look like {{getMetaStoreSchemaVersion()}}.
I looked into that possibility but it seems like in order to achieve that it may need a major
refactoring. I think in general HiveSchemaTool can be made lot more generic which will enable
such seamless plug-and-play design. In order to do that I propose to do following enhancements
to it.

1. I think HiveSchemaTool is in the BeeLine module currently only because it uses BeeLine
to run the queries on metastore. Ideally I think it makes sense to move HiveSchemaTool to
metastore module in the package {{org.apache.hadoop.hive.metastore.tools}}. How it runs the
queries should be left to the implementations of the interface. If we move it to metastore
package we can potentially just use JDOQL and datanucleus to query the database like what
MetaTool does.
2. In order to do the above we need to make it generic enough so that any database client
should be able to plugged into it to retrieve the results. So it should only interact with
these implementations through an interface (IMetaStoreSchemaInfo) which should also be in
the same package as HiveSchemaTool.
3. The implementations of the interface however could be user-defined. In case of Hive we
already have the default implementation using BeeLine which we could keep it in the BeeLine
module.
4. Once we do all the above, I think the interface will look a lot more cleaner as well as
the design.

What do you think about these proposals? We can take it up in a separate JIRA if you think
these make sense.

For now, I think the attached patch is reasonably generic enough given that there lot of cross
dependencies between the HiveSchemaTool, BeeLine and metastore. Can you please review and
let me know what you think? Thanks!

> Schematool should use MetastoreSchemaInfo to get the metastore schema version from database
> -------------------------------------------------------------------------------------------
>
>                 Key: HIVE-16771
>                 URL: https://issues.apache.org/jira/browse/HIVE-16771
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Vihang Karajgaonkar
>            Assignee: Vihang Karajgaonkar
>            Priority: Minor
>         Attachments: HIVE-16771.01.patch, HIVE-16771.02.patch, HIVE-16771.03.patch
>
>
> HIVE-16723 gives the ability to have a custom MetastoreSchemaInfo implementation to manage
schema upgrades and initialization if needed. In order to make HiveSchemaTool completely agnostic
it should depend on IMetastoreSchemaInfo implementation which is configured to get the metastore
schema version information from the database. It should also not assume the scripts directory
and hardcode it itself. It would rather ask MetastoreSchemaInfo class to get the metastore
scripts directory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message