hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rui Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-16071) Spark remote driver misuses the timeout in RPC handshake
Date Wed, 01 Mar 2017 07:42:45 GMT

    [ https://issues.apache.org/jira/browse/HIVE-16071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15889695#comment-15889695
] 

Rui Li commented on HIVE-16071:
-------------------------------

My understanding is the timeout on SparkClient side is longer because it needs to wait for
the RemoteDriver to launch. The timeout on the RemoteDriver side should be shorter because
the SparkClient is already running when RemoteDriver starts - and it usually won't take long
to just connect back and finish SASL handshake. Although the default 1000ms may be a little
too short.

Looking at the stack trace in description, we detect the channel is closed and eventually
get a {{SaslException}} instead of a {{TimeoutException}}. I wonder why the channel is closed
before the handshake finishes. [~ctang.ma], is it possible that your HS2 runs into some issue?

Another question (may be irrelevant to this JIRA) to [~vanzin]: we use the server side timeout
in two places:
# [Constructing RpcServer|https://github.com/apache/hive/blob/master/spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java#L108]
# [Registering client|https://github.com/apache/hive/blob/master/spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java#L162]

I understand 2 needs the long timeout because it includes the time to launch the RemoteDriver.
But does 1 also need that timeout? I think 1 only needs to take care of the SASL handshake,
which should take much less time.

> Spark remote driver misuses the timeout in RPC handshake
> --------------------------------------------------------
>
>                 Key: HIVE-16071
>                 URL: https://issues.apache.org/jira/browse/HIVE-16071
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>            Reporter: Chaoyu Tang
>            Assignee: Chaoyu Tang
>         Attachments: HIVE-16071.patch
>
>
> Based on its property description in HiveConf and the comments in HIVE-12650 (https://issues.apache.org/jira/browse/HIVE-12650?focusedCommentId=15128979&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15128979),
hive.spark.client.connect.timeout is the timeout when the spark remote driver makes a socket
connection (channel) to RPC server. But currently it is also used by the remote driver for
RPC client/server handshaking, which is not right. Instead, hive.spark.client.server.connect.timeout
should be used and it has already been used by the RPCServer in the handshaking.
> The error like following is usually caused by this issue, since the default hive.spark.client.connect.timeout
value (1000ms) used by remote driver for handshaking is a little too short.
> {code}
> 17/02/20 08:46:08 ERROR yarn.ApplicationMaster: User class threw exception: java.util.concurrent.ExecutionException:
javax.security.sasl.SaslException: Client closed before SASL negotiation finished.
> java.util.concurrent.ExecutionException: javax.security.sasl.SaslException: Client closed
before SASL negotiation finished.
>         at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
>         at org.apache.hive.spark.client.RemoteDriver.<init>(RemoteDriver.java:156)
>         at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:556)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542)
> Caused by: javax.security.sasl.SaslException: Client closed before SASL negotiation finished.
>         at org.apache.hive.spark.client.rpc.Rpc$SaslClientHandler.dispose(Rpc.java:453)
>         at org.apache.hive.spark.client.rpc.SaslHandler.channelInactive(SaslHandler.java:90)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message