spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcelo Vanzin (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (SPARK-23941) Mesos task failed on specific spark app name
Date Tue, 01 May 2018 15:31:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-23941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Marcelo Vanzin reassigned SPARK-23941:
--------------------------------------

    Assignee: bounkong khamphousone

> Mesos task failed on specific spark app name
> --------------------------------------------
>
>                 Key: SPARK-23941
>                 URL: https://issues.apache.org/jira/browse/SPARK-23941
>             Project: Spark
>          Issue Type: Bug
>          Components: Mesos, Spark Submit
>    Affects Versions: 2.2.1, 2.3.0
>         Environment: OS: Ubuntu 16.0.4
> Spark: 2.3.0
> Mesos: 1.5.0
>            Reporter: bounkong khamphousone
>            Assignee: bounkong khamphousone
>            Priority: Major
>             Fix For: 2.2.2, 2.3.1, 2.4.0
>
>
> It seems to be a bug related to spark's MesosClusterDispatcher. In order to reproduce
the bug, you need to have mesos and mesos dispatcher running.
> I'm currently running mesos 1.5 and spark 2.3.0 (tried with 2.2.1 as well).
> If you launch the following program:
>  
> {code:java}
> spark-submit --master mesos://127.0.1.1:7077 --deploy-mode cluster --class org.apache.spark.examples.SparkPi
--name "my favorite task (myId = 123-456)" /home/tiboun/tools/spark/examples/jars/spark-examples_2.11-2.3.0.jar
100
> {code}
> , then the task fails with the following output :
>  
> {code:java}
> I0409 11:00:35.360352 22726 fetcher.cpp:551] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/tiboun","items":[{"action":"BYPASS_CACHE","uri":{"cache":false,"extract":true,"value":"\/home\/tiboun\/tools\/spark\/examples\/jars\/spark-examples_2.11-2.3.0.jar"}}],"sandbox_directory":"\/var\/lib\/mesos\/slaves\/0262246c-14a3-4408-9b74-5e3b65dc1344-S0\/frameworks\/edff1a6f-38c6-46e0-a3c1-62a8fbfc2b5d-0014\/executors\/driver-20180409110035-0004\/runs\/8ac20902-74e1-45c4-9ab6-c52a79940189","user":"tiboun"}
> I0409 11:00:35.363119 22726 fetcher.cpp:450] Fetching URI '/home/tiboun/tools/spark/examples/jars/spark-examples_2.11-2.3.0.jar'
> I0409 11:00:35.363143 22726 fetcher.cpp:291] Fetching directly into the sandbox directory
> I0409 11:00:35.363168 22726 fetcher.cpp:225] Fetching URI '/home/tiboun/tools/spark/examples/jars/spark-examples_2.11-2.3.0.jar'
> W0409 11:00:35.366839 22726 fetcher.cpp:330] Copying instead of extracting resource from
URI with 'extract' flag, because it does not seem to be an archive: /home/tiboun/tools/spark/examples/jars/spark-examples_2.11-2.3.0.jar
> I0409 11:00:35.366873 22726 fetcher.cpp:603] Fetched '/home/tiboun/tools/spark/examples/jars/spark-examples_2.11-2.3.0.jar'
to '/var/lib/mesos/slaves/0262246c-14a3-4408-9b74-5e3b65dc1344-S0/frameworks/edff1a6f-38c6-46e0-a3c1-62a8fbfc2b5d-0014/executors/driver-20180409110035-0004/runs/8ac20902-74e1-45c4-9ab6-c52a79940189/spark-examples_2.11-2.3.0.jar'
> I0409 11:00:35.366878 22726 fetcher.cpp:608] Successfully fetched all URIs into '/var/lib/mesos/slaves/0262246c-14a3-4408-9b74-5e3b65dc1344-S0/frameworks/edff1a6f-38c6-46e0-a3c1-62a8fbfc2b5d-0014/executors/driver-20180409110035-0004/runs/8ac20902-74e1-45c4-9ab6-c52a79940189'
> I0409 11:00:35.438725 22733 exec.cpp:162] Version: 1.5.0
> I0409 11:00:35.440770 22734 exec.cpp:236] Executor registered on agent 0262246c-14a3-4408-9b74-5e3b65dc1344-S0
> I0409 11:00:35.441388 22733 executor.cpp:171] Received SUBSCRIBED event
> I0409 11:00:35.441586 22733 executor.cpp:175] Subscribed executor on tiboun-Dell-Precision-M3800
> I0409 11:00:35.441643 22733 executor.cpp:171] Received LAUNCH event
> I0409 11:00:35.441767 22733 executor.cpp:638] Starting task driver-20180409110035-0004
> I0409 11:00:35.445050 22733 executor.cpp:478] Running '/usr/libexec/mesos/mesos-containerizer
launch <POSSIBLY-SENSITIVE-DATA>'
> I0409 11:00:35.445770 22733 executor.cpp:651] Forked command at 22743
> sh: 1: Syntax error: "(" unexpected
> I0409 11:00:35.538661 22736 executor.cpp:938] Command exited with status 2 (pid: 22743)
> I0409 11:00:36.541016 22739 process.cpp:887] Failed to accept socket: future discarded
> {code}
> If you remove the parentheses, you get the following result:
>  
> {code:java}
> I0409 11:03:02.023701 23085 fetcher.cpp:551] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/tiboun","items":[{"action":"BYPASS_CACHE","uri":{"cache":false,"extract":true,"value":"\/home\/tiboun\/tools\/spark\/examples\/jars\/spark-examples_2.11-2.3.0.jar"}}],"sandbox_directory":"\/var\/lib\/mesos\/slaves\/0262246c-14a3-4408-9b74-5e3b65dc1344-S0\/frameworks\/edff1a6f-38c6-46e0-a3c1-62a8fbfc2b5d-0014\/executors\/driver-20180409110301-0006\/runs\/f887c0ab-b48f-4382-850c-383c1c944269","user":"tiboun"}
> I0409 11:03:02.028268 23085 fetcher.cpp:450] Fetching URI '/home/tiboun/tools/spark/examples/jars/spark-examples_2.11-2.3.0.jar'
> I0409 11:03:02.028302 23085 fetcher.cpp:291] Fetching directly into the sandbox directory
> I0409 11:03:02.028336 23085 fetcher.cpp:225] Fetching URI '/home/tiboun/tools/spark/examples/jars/spark-examples_2.11-2.3.0.jar'
> W0409 11:03:02.031209 23085 fetcher.cpp:330] Copying instead of extracting resource from
URI with 'extract' flag, because it does not seem to be an archive: /home/tiboun/tools/spark/examples/jars/spark-examples_2.11-2.3.0.jar
> I0409 11:03:02.031250 23085 fetcher.cpp:603] Fetched '/home/tiboun/tools/spark/examples/jars/spark-examples_2.11-2.3.0.jar'
to '/var/lib/mesos/slaves/0262246c-14a3-4408-9b74-5e3b65dc1344-S0/frameworks/edff1a6f-38c6-46e0-a3c1-62a8fbfc2b5d-0014/executors/driver-20180409110301-0006/runs/f887c0ab-b48f-4382-850c-383c1c944269/spark-examples_2.11-2.3.0.jar'
> I0409 11:03:02.031258 23085 fetcher.cpp:608] Successfully fetched all URIs into '/var/lib/mesos/slaves/0262246c-14a3-4408-9b74-5e3b65dc1344-S0/frameworks/edff1a6f-38c6-46e0-a3c1-62a8fbfc2b5d-0014/executors/driver-20180409110301-0006/runs/f887c0ab-b48f-4382-850c-383c1c944269'
> I0409 11:03:02.090797 23095 exec.cpp:162] Version: 1.5.0
> I0409 11:03:02.095283 23092 exec.cpp:236] Executor registered on agent 0262246c-14a3-4408-9b74-5e3b65dc1344-S0
> I0409 11:03:02.096693 23095 executor.cpp:171] Received SUBSCRIBED event
> I0409 11:03:02.097040 23095 executor.cpp:175] Subscribed executor on tiboun-Dell-Precision-M3800
> I0409 11:03:02.097141 23095 executor.cpp:171] Received LAUNCH event
> I0409 11:03:02.097357 23095 executor.cpp:638] Starting task driver-20180409110301-0006
> I0409 11:03:02.101521 23095 executor.cpp:478] Running '/usr/libexec/mesos/mesos-containerizer
launch <POSSIBLY-SENSITIVE-DATA>'
> I0409 11:03:02.102332 23095 executor.cpp:651] Forked command at 23100
> Error: Cannot load main class from JAR file:/var/lib/mesos/slaves/0262246c-14a3-4408-9b74-5e3b65dc1344-S0/frameworks/edff1a6f-38c6-46e0-a3c1-62a8fbfc2b5d-0014/executors/driver-20180409110301-0006/runs/f887c0ab-b48f-4382-850c-383c1c944269/favorite
> Run with --help for usage help or --verbose for debug output
> I0409 11:03:02.792325 23090 executor.cpp:938] Command exited with status 1 (pid: 23100)
> I0409 11:03:03.794505 23098 process.cpp:887] Failed to accept socket: future discarded
> {code}
> Interesting things is that mesos try to find main class on a file called "favorite" which
is part of the task name.
>  
> I've tried to launch spark-shell with the same name and it works fine. Task name's get
driver's name and add a sequence after it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message