hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aron Hamvas (Jira)" <>
Subject [jira] [Updated] (HIVE-22420) DbTxnManager.stopHeartbeat() should be thread-safe
Date Wed, 30 Oct 2019 12:37:00 GMT


Aron Hamvas updated HIVE-22420:
    Attachment: HIVE-22420.1.patch
        Status: Patch Available  (was: In Progress)

> DbTxnManager.stopHeartbeat() should be thread-safe
> --------------------------------------------------
>                 Key: HIVE-22420
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 3.1.0
>            Reporter: Aron Hamvas
>            Assignee: Aron Hamvas
>            Priority: Major
>         Attachments: HIVE-22420.1.patch
> When a transactional query is being executed and interrupted via HS2 close operation
request, both the background pool thread executing the query and the HttpHandler thread running
the close operation logic will eventually call the below method:
> {noformat}
> Driver.releaseLocksAndCommitOrRollback(commit boolean)
> {noformat}
> Since this method is invoked several times in both threads, it can happen that the two
threads invoke it at the same time, and due to a race condition, the txnId field of the DbTxnManager
used by both threads could be set to 0 without actually successfully aborting the transaction.
> The root cause is stopHeartbeat() method in DbTxnManager not being thread safe:
> When Thread-1 and Thread-2 enter stopHeartbeat() with very little time difference, Thread-1
might successfully cancel the heartbeat task and set the heartbeatTask field to null, while
Thread-2 is trying to observe its state. Thread-1 will return to the calling rollbackTxn()
method and continue execution there, while Thread-2 wis thrown back to the same method with
a NullPointerException. Thread-2 will then set txnId to 0, and Thread-1 is sending this 0
value to HMS. So, the txn will not be aborted, and the locks cannot be released later on either.

This message was sent by Atlassian Jira

View raw message