spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From SaintBacchus <...@git.apache.org>
Subject [GitHub] spark pull request: [SPARK-6924][YARN] Fix driver hangs in yarn-cl...
Date Tue, 28 Apr 2015 09:06:09 GMT
Github user SaintBacchus commented on the pull request:

    https://github.com/apache/spark/pull/5663#issuecomment-96983228
  
    @tgravescs Yeah the status  ` FinalApplicationStatus.UNDEFINED`  about the application
is better than  Fail and Kill since the client can't know really on the application when network
shaky.
    So the modify will be as this:
    ```scala
        logError("Can't gain the status of application from Yarn because of exception: ",
e)
        return (YarnApplicationState.FAILED, FinalApplicationStatus.UNDEFINED)
    ```
    Back the suggestion as @vanzin said, I used `jstack` to catch the process(net is back
but process still wait). It showed that only the main-thread was a non-demon thread except
some java thread such as GC task.
    ```java
    "main" prio=10 tid=0x000000000060a800 nid=0xd7ae in Object.wait() [0x00007fe7dcfb3000]
       java.lang.Thread.State: WAITING (on object monitor)
    	at java.lang.Object.wait(Native Method)
    	- waiting on <0x00000000e02b9a58> (a org.apache.spark.scheduler.JobWaiter)
    	at java.lang.Object.wait(Object.java:503)
    	at org.apache.spark.scheduler.JobWaiter.awaitResult(JobWaiter.scala:73)
    	- locked <0x00000000e02b9a58> (a org.apache.spark.scheduler.JobWaiter)
    	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:526)
    	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1586)
    	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1655)
    	at org.apache.spark.rdd.RDD.reduce(RDD.scala:906)
    	at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:35)
    	at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:606)
    	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:611)
    	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:171)
    	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:194)
    	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:115)
    	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    
    "VM Thread" prio=10 tid=0x0000000000665800 nid=0xd7b1 runnable 
    
    "GC task thread#0 (ParallelGC)" prio=10 tid=0x0000000000620000 nid=0xd7af runnable 
    
    "GC task thread#1 (ParallelGC)" prio=10 tid=0x0000000000622000 nid=0xd7b0 runnable 
    
    "VM Periodic Task Thread" prio=10 tid=0x00007fe7d003d800 nid=0xd7b8 waiting on condition

    
    JNI global references: 29
    ```
    
    I don't know the deeper reason why the main thread is still waiting, but my modify is
work for this problem. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message