hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siddharth Seth (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-15722) LLAP: Avoid marking a query as complete if the AMReporter runs into an error
Date Thu, 26 Jan 2017 04:52:26 GMT

     [ https://issues.apache.org/jira/browse/HIVE-15722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Siddharth Seth updated HIVE-15722:
----------------------------------
    Attachment: HIVE-15722.02.patch

Updated patch.

Removed the LLL/TODO.
Removed the relevant log lines.
Assuming you meant the "Queueing container" log line? - Moved that down to include the fragmentId
in the same log line.

> LLAP: Avoid marking a query as complete if the AMReporter runs into an error
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-15722
>                 URL: https://issues.apache.org/jira/browse/HIVE-15722
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>         Attachments: HIVE-15722.01.patch, HIVE-15722.02.patch
>
>
> When the AMReporter runs into an error (typically intermittent), we end up killing all
fragments on the daemon. This is done by marking the query as complete.
> The AM would continue to try scheduling on this node - which would lead to task failures
if the daemon structures are updated.
> Instead of clearing the structures, it's better to kill the fragments, and let a queryComplete
call come in from the AM.
> Later, we could make enhancements in the AM to avoid such nodes. That's not simple though,
since the AM will not find out what happened due to the communication failure from the daemon.
> Leads to 
> {code}
> org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException): Dag query16 already
complete. Rejecting fragment [Map 7, 29, 0]
> 	at org.apache.hadoop.hive.llap.daemon.impl.QueryTracker.registerFragment(QueryTracker.java:149)
> 	at org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl.submitWork(ContainerRunnerImpl.java:226)
> 	at org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.submitWork(LlapDaemon.java:487)
> 	at org.apache.hadoop.hive.llap.daemon.impl.LlapProtocolServerImpl.submitWork(LlapProtocolServerImpl.java:101)
> 	at org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$LlapDaemonProtocol$2.callBlockingMethod(LlapDaemonProtocolProtos.java:16728)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message