hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-7525) Research to find out if it's possible to submit Spark jobs concurrently using shared SparkContext
Date Tue, 29 Jul 2014 17:44:39 GMT

    [ https://issues.apache.org/jira/browse/HIVE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078056#comment-14078056
] 

Chao commented on HIVE-7525:
----------------------------

I modified SparkClient to make it submit rdd4 via a separate thread, which simply does the
"foreach" in the "run" method. However, I keep getting this issue
about not being able to find the plan file:

14/07/29 10:01:37 INFO exec.Utilities: local path = hdfs://localhost:8020/tmp/hive-chao/6ab5877a-ba1a-4761-971e-45d9b46cd3c6/hive_2014-07-29_10-01-28_749_8375059517503664847-1/-mr-10003/1a80d789-63d8-43bb-b3f4-4ad74a66b0af/map.xml
14/07/29 10:01:37 INFO exec.Utilities: Open file to read in plan: hdfs://localhost:8020/tmp/hive-chao/6ab5877a-ba1a-4761-971e-45d9b46cd3c6/hive_2014-07-29_10-01-28_749_8375059517503664847-1/-mr-10003/1a80d789-63d8-43bb-b3f4-4ad74a66b0af/map.xml
14/07/29 10:01:37 INFO exec.Utilities: File not found: File does not exist: /tmp/hive-chao/6ab5877a-ba1a-4761-971e-45d9b46cd3c6/hive_2014-07-29_10-01-28_749_8375059517503664847-1/-mr-10003/1a80d789-63d8-43bb-b3f4-4ad74a66b0af/map.xml
	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65)
	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1726)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1669)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1649)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1621)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:482)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:322)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)

On the other hand, if I trigger the "foreach" in the current thread, everything is fine.
Maybe it's because the hadoop FS doesn't allow accessing the same file from different threads?
Not sure why.

> Research to find out if it's possible to submit Spark jobs concurrently using shared
SparkContext
> -------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-7525
>                 URL: https://issues.apache.org/jira/browse/HIVE-7525
>             Project: Hive
>          Issue Type: Task
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Chao
>
> Refer to HIVE-7503 and SPARK-2688. Find out if it's possible to submit multiple spark
jobs concurrently using a shared SparkContext. SparkClient's code can be manipulated for this
test. Here is the process:
> 1. Transform rdd1 to rdd2 using some transformation.
> 2. call rdd2.cache() to persist it in memory.
> 3. in two threads, calling accordingly:
>     Thread a. rdd2 -> rdd3; rdd3.foreach()
>     Thread b. rdd2 -> rdd4; rdd4.foreach()
> It would be nice to find out monitoring and error reporting aspects.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message