hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-16854) SparkClientFactory is locked too aggressively
Date Thu, 08 Jun 2017 03:36:18 GMT

     [ https://issues.apache.org/jira/browse/HIVE-16854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Xuefu Zhang updated HIVE-16854:
-------------------------------
    Description: 
Most methods in SparkClientFactory are synchronized on the SparkClientFactory singleton. However,
some methods are very expensive, such as createClient(), which returns a SparkClientImpl instance.
However, creating a SparkClientImpl instance requires starting a remote driver to connect
back to RPCServer. This process can take a long time such as in case of a busy yarn queue.
When this happens, all pending  calls on SparkClientFactory will have to wait for a long time.

In our case, hive.spark.client.server.connect.timeout is set to 1hr. This makes some queries
waiting for hours before starting.

The current implementation seems pretty much making all remote driver launches serialized.
If one of them takes time, the following ones will have to wait.

HS2 stacktrace is attached for reference. It's based on earlier version of Hive, so the line
numbers might be slightly off. The following shows the locking effect:

{code}
xuefu@hadoopservice20-sjc1:~$ grep org.apache.hive.spark.client.SparkClientFactory 15763.jstack

	at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:79)
	- waiting to lock <0x00007f78fa1a9cc0> (a java.lang.Class for org.apache.hive.spark.client.SparkClientFactory)
	at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:79)
	- waiting to lock <0x00007f78fa1a9cc0> (a java.lang.Class for org.apache.hive.spark.client.SparkClientFactory)
	at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80)
	- locked <0x00007f78fa1a9cc0> (a java.lang.Class for org.apache.hive.spark.client.SparkClientFactory)
	at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:79)
	- waiting to lock <0x00007f78fa1a9cc0> (a java.lang.Class for org.apache.hive.spark.client.SparkClientFactory)
	at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:79)
	- waiting to lock <0x00007f78fa1a9cc0> (a java.lang.Class for org.apache.hive.spark.client.SparkClientFactory)
	at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:79)
	- waiting to lock <0x00007f78fa1a9cc0> (a java.lang.Class for org.apache.hive.spark.client.SparkClientFactory)
{code}

  was:
Most methods in SparkClientFactory are synchronized on the SparkClientFactory singleton. However,
some methods are very expensive, such as createClient(), which returns a SparkClientImpl instance.
However, creating a SparkClientImpl instance requires starting a remote driver to connect
back to RPCServer. This process can take a long time such as in case of a busy yarn queue.
When this happens, all pending  calls on SparkClientFactory will have to wait for a long time.

In our case, hive.spark.client.server.connect.timeout is set to 1hr. This makes some queries
waiting for hours before starting.

The current implementation seems pretty much making all remote driver launches serialized.
If one of them takes time, the following ones will have to wait.

HS2 stacktrace is attached for reference. It's based on earlier version of Hive, so the line
numbers might be slightly off.


> SparkClientFactory is locked too aggressively
> ---------------------------------------------
>
>                 Key: HIVE-16854
>                 URL: https://issues.apache.org/jira/browse/HIVE-16854
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>    Affects Versions: 1.1.0
>            Reporter: Xuefu Zhang
>            Assignee: Rui Li
>         Attachments: 15763.jstack
>
>
> Most methods in SparkClientFactory are synchronized on the SparkClientFactory singleton.
However, some methods are very expensive, such as createClient(), which returns a SparkClientImpl
instance. However, creating a SparkClientImpl instance requires starting a remote driver to
connect back to RPCServer. This process can take a long time such as in case of a busy yarn
queue. When this happens, all pending  calls on SparkClientFactory will have to wait for a
long time.
> In our case, hive.spark.client.server.connect.timeout is set to 1hr. This makes some
queries waiting for hours before starting.
> The current implementation seems pretty much making all remote driver launches serialized.
If one of them takes time, the following ones will have to wait.
> HS2 stacktrace is attached for reference. It's based on earlier version of Hive, so the
line numbers might be slightly off. The following shows the locking effect:
> {code}
> xuefu@hadoopservice20-sjc1:~$ grep org.apache.hive.spark.client.SparkClientFactory 15763.jstack

> 	at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:79)
> 	- waiting to lock <0x00007f78fa1a9cc0> (a java.lang.Class for org.apache.hive.spark.client.SparkClientFactory)
> 	at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:79)
> 	- waiting to lock <0x00007f78fa1a9cc0> (a java.lang.Class for org.apache.hive.spark.client.SparkClientFactory)
> 	at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80)
> 	- locked <0x00007f78fa1a9cc0> (a java.lang.Class for org.apache.hive.spark.client.SparkClientFactory)
> 	at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:79)
> 	- waiting to lock <0x00007f78fa1a9cc0> (a java.lang.Class for org.apache.hive.spark.client.SparkClientFactory)
> 	at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:79)
> 	- waiting to lock <0x00007f78fa1a9cc0> (a java.lang.Class for org.apache.hive.spark.client.SparkClientFactory)
> 	at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:79)
> 	- waiting to lock <0x00007f78fa1a9cc0> (a java.lang.Class for org.apache.hive.spark.client.SparkClientFactory)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message