hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-14979) Removing stale Zookeeper locks at HiveServer2 initialization
Date Wed, 19 Oct 2016 18:32:58 GMT

    [ https://issues.apache.org/jira/browse/HIVE-14979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15589471#comment-15589471
] 

Sergey Shelukhin commented on HIVE-14979:
-----------------------------------------

Hmm... sorry, I still don't quite understand the problem.

TL;DR the patch makes sense if it is to work around some network timeouts, or ZK not deleting
nodes the way we expect. Otherwise I think we need to make sure it's compatible with timeout
logic and/or just use ZK expiration.

TL:
Do the locks in ZK already expire at some point after HS2 dies? 
If the locks don't expire, we should make them expire as per below ;)
If they do...
>From my understanding, ZK cleans up ephemeral nodes immediately when the process goes
down in normal case (based on the connection breaking), regardless of the timeout set for
session (that is more of a network timeout and would result in nodes being cleaned up if the
connection doesn't immediately break or in other "abnormal" cases). 
Is the timeout we add some additional logical timeout on top of normal cleanup, so that even
when HS2 dies and the connection is broken, ZK doesn't clean up the nodes for some time after
the disconnect?

If yes, and we set a large timeout for a reason, we should not clean them up before timeout.
The reason for a large timeout could be that the locks are taken for external jobs that don't
die immediately (or at all?) when HS2 dies.
If yes, and we set a large timeout for no good reason (=> we believe we can clean them
up during startup, as we do in the patch), we should also reduce the timeout (or remove it
and use the default).






> Removing stale Zookeeper locks at HiveServer2 initialization
> ------------------------------------------------------------
>
>                 Key: HIVE-14979
>                 URL: https://issues.apache.org/jira/browse/HIVE-14979
>             Project: Hive
>          Issue Type: Improvement
>          Components: Locking
>            Reporter: Peter Vary
>            Assignee: Peter Vary
>         Attachments: HIVE-14979.3.patch, HIVE-14979.4.patch, HIVE-14979.patch
>
>
> HiveServer2 could use Zookeeper to store token that indicate that particular tables are
locked with the creation of persistent Zookeeper objects. 
> A problem can occur when a HiveServer2 instance creates a lock on a table and the HiveServer2
instances crashes ("Out of Memory" for example) and the locks are not released in Zookeeper.
This lock will then remain until it is manually cleared by an admin.
> There should be a way to remove stale locks at HiveServer2 initialization, helping the
admins life.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message