hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-13132) Hive should lazily load and cache metastore (permanent) functions
Date Wed, 24 Feb 2016 15:00:25 GMT

    [ https://issues.apache.org/jira/browse/HIVE-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163143#comment-15163143
] 

Alan Gates commented on HIVE-13132:
-----------------------------------

Several comments:
# It would be good to test whether HIVE-2573 solves the issue, since there's no point in making
further changes if it does.
# I see how this code prevents the system from repeatedly downloading the functions (since
it tracks whether the metastore has been searched) but I don't see how it prevents pre-fetching
all the functions at startup.
# I don't think using statics in the FunctionRegistry will work.  This will cause HiveServer2
to share the function names across sessions, which we don't want because there won't be a
way to force new functions to be downloaded.  That is, HS2 will download the set of functions
when it first starts, and not do so again because the static haveSearchedMetastore will be
true.

cc [~jdere] and [~sershe] since both of you have done work in this area recently.

> Hive should lazily load and cache metastore (permanent) functions
> -----------------------------------------------------------------
>
>                 Key: HIVE-13132
>                 URL: https://issues.apache.org/jira/browse/HIVE-13132
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 0.13.1
>            Reporter: Anthony Hsu
>            Assignee: Anthony Hsu
>         Attachments: HIVE-13132.1.patch
>
>
> In Hive 0.13.1, we have noticed that as the number of databases increases, the start-up
time of the Hive interactive shell increases. This is because during start-up, all databases
are iterated over to fetch the permanent functions to display in the {{SHOW FUNCTIONS}} output.
> {noformat:title=FunctionRegistry.java}
>   private static Set<String> getFunctionNames(boolean searchMetastore) {
>     Set<String> functionNames = mFunctions.keySet();
>     if (searchMetastore) {
>       functionNames = new HashSet<String>(functionNames);
>       try {
>         Hive db = getHive();
>         List<String> dbNames = db.getAllDatabases();
>         for (String dbName : dbNames) {
>           List<String> funcNames = db.getFunctions(dbName, "*");
>           for (String funcName : funcNames) {
>             functionNames.add(FunctionUtils.qualifyFunctionName(funcName, dbName));
>           }
>         }
>       } catch (Exception e) {
>         LOG.error(e);
>         // Continue on, we can still return the functions we've gotten to this point.
>       }
>     }
>     return functionNames;
>   }
> {noformat}
> Instead of eagerly loading all metastore functions, we should only load them the first
time {{SHOW FUNCTIONS}} is invoked. We should also cache the results.
> Note that this issue may have been fixed by HIVE-2573, though I haven't verified this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message