hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sushanth Sowmyan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-9436) RetryingMetaStoreClient does not retry JDOExceptions
Date Fri, 23 Jan 2015 20:58:34 GMT

    [ https://issues.apache.org/jira/browse/HIVE-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14289961#comment-14289961
] 

Sushanth Sowmyan commented on HIVE-9436:
----------------------------------------

[~thejas]/[~hsubramaniyan] : I have a couple of thoughts about moving JDOException retries
solely to the metastore:

a) Firstly, we have had cases so far where a JDOException invalidates the connection on the
metastore side, and retrying from the metastore has not helped. Retrying from the client-side,
though, causes a fresh openTransaction() that clears the connection and all history, sometimes
by hitting a different HMSHandler, and this causes the retry from client to be more successful
than a retry from server. Admittedly, this is more likely because we need to clean up our
metastore code to make sure that the retry from the metastore-side handles this properly,
and thus, is something we should attempt to improve.
b) Second, from a perspective of a loaded metastore, having a metastore thread do retries,
thus using up valuable metastore resources/time is more wasteful than having the client do
retries. We thus tend to keep our metastore-side retries to a low amount, but the fact that
we have client-side retries as well gives us an ability to be fail-fast on the metastore,
but retry a large number of times in particular clients if we find the need to do so. Particularly,
in HA configurations, I've seen a large number of retries and longer retry-intervals on the
client side that allow a connection to go through despite metastore HUPs.
c) Thirdly, speaking of HA, retrying on the client-side allows us to hit alternate metastores
as well, if configured, if we have scenarios where one metastore is getting bogged down. As
you mention, client should ideally only be retrying connection exceptions, but JDOExceptions
are frequently the result of connection exceptions raised by the connection pool from the
metastore to the db.

There is definitely scope for refactoring and improvement in all this, I will look into it
further, but for now, this is a simpler bugfix to enable the already-existing regex to work
correctly.

> RetryingMetaStoreClient does not retry JDOExceptions
> ----------------------------------------------------
>
>                 Key: HIVE-9436
>                 URL: https://issues.apache.org/jira/browse/HIVE-9436
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.14.0, 0.13.1
>            Reporter: Sushanth Sowmyan
>            Assignee: Sushanth Sowmyan
>         Attachments: HIVE-9436.2.patch, HIVE-9436.patch
>
>
> RetryingMetaStoreClient has a bug in the following bit of code:
> {code}
>         } else if ((e.getCause() instanceof MetaException) &&
>             e.getCause().getMessage().matches("JDO[a-zA-Z]*Exception")) {
>           caughtException = (MetaException) e.getCause();
>         } else {
>           throw e.getCause();
>         }
> {code}
> The bug here is that java String.matches matches the entire string to the regex, and
thus, that match will fail if the message contains anything before or after JDO[a-zA-Z]\*Exception.
The solution, however, is very simple, we should match .\*JDO[a-zA-Z]\*Exception.\*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message