hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-16071) Spark remote driver misuses the timeout in RPC handshake
Date Tue, 07 Mar 2017 03:44:32 GMT

    [ https://issues.apache.org/jira/browse/HIVE-16071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15898686#comment-15898686
] 

Xuefu Zhang commented on HIVE-16071:
------------------------------------

Hi [~ctang.ma], Thanks for your findings and analysis. Would you think it's better if we set
the timeout for cancelTask to client.connect.timeout instead, which is used on remote driver
side anyway (with HIVE-15671)? I think the purpose of cancelTask is to make sure that Sasl
connection is established within a given timeframe when connection request comes from remote
driver. However, server.connect.timeout is usually much longer (1hr in our cluster), and we
don't expect that Sasl would take such long to establish. This way, we are consistent on both
server side and client side.

On the other hand, if we user server.connect.timeout for both registerClient and cancelTask,
then what can happens is that if Sasl handshake happens right after registerClient but failed
to establish and if the connection isn't actually broken at that point, then the server will
detect a connection loss and so will not react to this until registerClient timeout, which
can be significantly delayed (1hr in our case rather just a few seconds).

Thus, what your found seems to be a bug, however rarely it happens.

> Spark remote driver misuses the timeout in RPC handshake
> --------------------------------------------------------
>
>                 Key: HIVE-16071
>                 URL: https://issues.apache.org/jira/browse/HIVE-16071
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>            Reporter: Chaoyu Tang
>            Assignee: Chaoyu Tang
>         Attachments: HIVE-16071.patch
>
>
> Based on its property description in HiveConf and the comments in HIVE-12650 (https://issues.apache.org/jira/browse/HIVE-12650?focusedCommentId=15128979&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15128979),
hive.spark.client.connect.timeout is the timeout when the spark remote driver makes a socket
connection (channel) to RPC server. But currently it is also used by the remote driver for
RPC client/server handshaking, which is not right. Instead, hive.spark.client.server.connect.timeout
should be used and it has already been used by the RPCServer in the handshaking.
> The error like following is usually caused by this issue, since the default hive.spark.client.connect.timeout
value (1000ms) used by remote driver for handshaking is a little too short.
> {code}
> 17/02/20 08:46:08 ERROR yarn.ApplicationMaster: User class threw exception: java.util.concurrent.ExecutionException:
javax.security.sasl.SaslException: Client closed before SASL negotiation finished.
> java.util.concurrent.ExecutionException: javax.security.sasl.SaslException: Client closed
before SASL negotiation finished.
>         at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
>         at org.apache.hive.spark.client.RemoteDriver.<init>(RemoteDriver.java:156)
>         at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:556)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542)
> Caused by: javax.security.sasl.SaslException: Client closed before SASL negotiation finished.
>         at org.apache.hive.spark.client.rpc.Rpc$SaslClientHandler.dispose(Rpc.java:453)
>         at org.apache.hive.spark.client.rpc.SaslHandler.channelInactive(SaslHandler.java:90)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message