hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-12904) LLAP: deadlock in task scheduling
Date Mon, 25 Jan 2016 04:33:39 GMT

    [ https://issues.apache.org/jira/browse/HIVE-12904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15114740#comment-15114740
] 

Hive QA commented on HIVE-12904:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12784079/HIVE-12904.3.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10029 tests executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6728/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6728/console
Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6728/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12784079 - PreCommit-HIVE-TRUNK-Build

> LLAP: deadlock in task scheduling
> ---------------------------------
>
>                 Key: HIVE-12904
>                 URL: https://issues.apache.org/jira/browse/HIVE-12904
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: Hui Zheng
>            Assignee: Sergey Shelukhin
>            Priority: Critical
>         Attachments: HIVE-12904.2.patch, HIVE-12904.3.patch, HIVE-12904.patch
>
>
> {noformat}
> Thread 34107: (state = BLOCKED)
>  - org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$TaskWrapper.isInWaitQueue()
@bci=0, line=690 (Compiled frame)
>  - org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService.finishableStateUpdated(org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$TaskWrapper,
boolean) @bci=8, line=485 (Compiled frame)
>  - org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService.access$1500(org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService,
org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$TaskWrapper, boolean) @bci=3,
line=78 (Compiled frame)
>  - org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$TaskWrapper.finishableStateUpdated(boolean)
@bci=27, line=733 (Compiled frame)
>  - org.apache.hadoop.hive.llap.daemon.impl.QueryInfo$FinishableStateTracker.sourceStateUpdated(java.lang.String)
@bci=76, line=210 (Compiled frame)
>  - org.apache.hadoop.hive.llap.daemon.impl.QueryInfo.sourceStateUpdated(java.lang.String)
@bci=5, line=164 (Compiled frame)
>  - org.apache.hadoop.hive.llap.daemon.impl.QueryTracker.registerSourceStateChange(java.lang.String,
java.lang.String, org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$SourceStateProto)
@bci=34, line=228 (Compiled frame)
>  - org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl.sourceStateUpdated(org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$SourceStateUpdatedRequestProto)
@bci=47, line=255 (Compiled frame)
>  - org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.sourceStateUpdated(org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$SourceStateUpdatedRequestProto)
@bci=5, line=328 (Compiled frame)
>  - org.apache.hadoop.hive.llap.daemon.impl.LlapDaemonProtocolServerImpl.sourceStateUpdated(com.google.protobuf.RpcController,
org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$SourceStateUpdatedRequestProto)
@bci=5, line=105 (Compiled frame)
>  - org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$LlapDaemonProtocol$2.callBlockingMethod(com.google.protobuf.Descriptors$MethodDescriptor,
com.google.protobuf.RpcController, com.google.protobuf.Message) @bci=80, line=13067 (Compiled
frame)
>  - org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(org.apache.hadoop.ipc.RPC$Server,
java.lang.String, org.apache.hadoop.io.Writable, long) @bci=246, line=616 (Compiled frame)
>  - org.apache.hadoop.ipc.RPC$Server.call(org.apache.hadoop.ipc.RPC$RpcKind, java.lang.String,
org.apache.hadoop.io.Writable, long) @bci=9, line=969 (Compiled frame)
>  - org.apache.hadoop.ipc.Server$Handler$1.run() @bci=38, line=2151 (Compiled frame)
>  - org.apache.hadoop.ipc.Server$Handler$1.run() @bci=1, line=2147 (Compiled frame)
>  - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
java.security.AccessControlContext) @bci=0 (Compiled frame)
>  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction)
@bci=42, line=422 (Compiled frame)
>  - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
@bci=14, line=1657 (Compiled frame)
>  - org.apache.hadoop.ipc.Server$Handler.run() @bci=315, line=2145 (Interpreted frame)
> and 
> Thread 34500: (state = BLOCKED)
>  - org.apache.hadoop.hive.llap.daemon.impl.QueryInfo$FinishableStateTracker.unregisterForUpdates(org.apache.hadoop.hive.llap.daemon.FinishableStateUpdateHandler)
@bci=0, line=195 (Compiled frame)
>  - org.apache.hadoop.hive.llap.daemon.impl.QueryInfo.unregisterFinishableStateUpdate(org.apache.hadoop.hive.llap.daemon.FinishableStateUpdateHandler)
@bci=5, line=160 (Compiled frame)
>  - org.apache.hadoop.hive.llap.daemon.impl.QueryFragmentInfo.unregisterForFinishableStateUpdates(org.apache.hadoop.hive.llap.daemon.FinishableStateUpdateHandler)
@bci=5, line=143 (Compiled frame)
>  - org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$TaskWrapper.maybeUnregisterForFinishedStateNotifications()
@bci=20, line=681 (Compiled frame)
>  - org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$InternalCompletionListener.onSuccess(org.apache.tez.runtime.task.TaskRunner2Result)
@bci=32, line=548 (Compiled frame)
>  - org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$InternalCompletionListener.onSuccess(java.lang.Object)
@bci=5, line=535 (Compiled frame)
>  - com.google.common.util.concurrent.Futures$4.run() @bci=55, line=1149 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
@bci=95, line=1142 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617 (Interpreted
frame)
>  - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
> "IPC Server handler 0 on 15001":
>   waiting to lock Monitor@0x00007f5d322ecb08 (Object@0x00007f67032cd2c0, a org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService$TaskWrapper),
>   which is held by "ExecutionCompletionThread #0"
> "ExecutionCompletionThread #0":
>   waiting to lock Monitor@0x00007f6066b9e8c8 (Object@0x00007f66b6570200, a org/apache/hadoop/hive/llap/daemon/impl/QueryInfo$FinishableStateTracker),
>   which is held by "IPC Server handler 0 on 15001"
> Found a total of 1 deadlock.
> {noformat}
> Looks like it's caused by synchronized blocks:
> {noformat}
> TaskWrapper:
> public synchronized void maybeUnregisterForFinishedStateNotifications
> {noformat}
> Eventually calls 
> {noformat}
> FinishableStateTracker
> synchronized void unregisterForUpdates(FinishableStateUpdateHandler handler) {
> {noformat}
> and 
> {noformat}
> FST
>  synchronized void sourceStateUpdated(String sourceName) {
>    {noformat}
> eventually calls
> {noformat}
>  public synchronized boolean isInWaitQueue() {
> {noformat}
> The latter returns the boolean, so it definitely doesn't need synchronized, however I
don't know if there are other similar issues and what is necessary inside sync blocks, perhaps
there's a better fix.
> Overall I'd say synch methods on objects that call any other non-trivial objects should
not be used. Perhaps for now it would be good to replace all sync methods by sync blocks that
cover entire method, as well as remove the unnecessary ones like the isWait... one. Then the
scope of the blocks can be adjusted based on logic in future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message