mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kone <vinodk...@gmail.com>
Subject Re: Connecting spark from a different Machine to mesos cluster
Date Wed, 15 Oct 2014 17:14:01 GMT
http://stackoverflow.com/questions/24559616/mesos-scheduler-slave-continuously-gets-disconnected

On Wed, Oct 15, 2014 at 9:57 AM, Brian Devins <Brian.Devins@dealer.com>
wrote:

>  Also Johannes, is there a network segment between Spark and the Mesos
> master? This looks like behavior I have seen before when the Master cannot
> connect back to the framework. The master also needs to be able to reach
> the Spark machine by IP
>
>   From: Tim Chen <tim@mesosphere.io>
> Reply-To: "user@mesos.apache.org" <user@mesos.apache.org>
> Date: Wednesday, October 15, 2014 at 12:52 PM
> To: "user@mesos.apache.org" <user@mesos.apache.org>
>
> Subject: Re: Connecting spark from a different Machine to mesos cluster
>
>   Hi Johannes,
>
>  When you started your 2nd shell, what log output from the slave do you
> see for that framework?
>
>  Master seems to think it's already terminated.
>
>  Tim
>
> On Wed, Oct 15, 2014 at 6:31 AM, Johannes Schillinger (Intern) <
> johannes.schillinger@citrix.com> wrote:
>
>>  Hi Tim,
>>
>>
>>
>> We are running Spark 1.1.0 with Hadoop 2.4. Mesos is in Version 0.20.1
>> all in binary releases.
>>
>>
>>
>> The Spark console is running in default mode, which is fine grained.
>>
>>
>>
>> The Spark process is started from a physical Machine running Ubuntu, the
>> Mesos nodes are running in VMs also in Ubuntu.
>>
>>
>>
>> This is the output of the Spark Shell:
>>
>>
>>
>>
>> --------------------------------------------------------------------------------------------------------------------------------
>>
>> Spark assembly has been built with Hive, including Datanucleus jars on
>> classpath
>>
>> Using Spark's default log4j profile:
>> org/apache/spark/log4j-defaults.properties
>>
>> 14/10/15 15:18:24 INFO SecurityManager: Changing view acls to: USERNAME,
>>
>> 14/10/15 15:18:24 INFO SecurityManager: Changing modify acls to: USERNAME,
>>
>> 14/10/15 15:18:24 INFO SecurityManager: SecurityManager: authentication
>> disabled; ui acls disabled; users with view permissions: Set(USERNAME, );
>> users with modify permissions: Set(USERNAME, )
>>
>> 14/10/15 15:18:24 INFO HttpServer: Starting HTTP Server
>>
>> 14/10/15 15:18:24 INFO Utils: Successfully started service 'HTTP class
>> server' on port 42469.
>>
>> Welcome to
>>
>>       ____              __
>>
>>      / __/__  ___ _____/ /__
>>
>>     _\ \/ _ \/ _ `/ __/  '_/
>>
>>    /___/ .__/\_,_/_/ /_/\_\   version 1.1.0
>>
>>       /_/
>>
>>
>>
>> Using Scala version 2.10.4 (OpenJDK 64-Bit Server VM, Java 1.7.0_65)
>>
>> Type in expressions to have them evaluated.
>>
>> Type :help for more information.
>>
>> 14/10/15 15:18:26 WARN Utils: Your hostname, karwjohannes01 resolves to a
>> loopback address: 127.0.1.1; using CLIENT_IP instead (on interface eth0)
>>
>> 14/10/15 15:18:26 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
>> another address
>>
>> 14/10/15 15:18:27 INFO SecurityManager: Changing view acls to: USERNAME,
>>
>> 14/10/15 15:18:27 INFO SecurityManager: Changing modify acls to: USERNAME,
>>
>> 14/10/15 15:18:27 INFO SecurityManager: SecurityManager: authentication
>> disabled; ui acls disabled; users with view permissions: Set(USERNAME, );
>> users with modify permissions: Set(USERNAME, )
>>
>> 14/10/15 15:18:27 INFO Slf4jLogger: Slf4jLogger started
>>
>> 14/10/15 15:18:27 INFO Remoting: Starting remoting
>>
>> 14/10/15 15:18:27 INFO Remoting: Remoting started; listening on addresses
>> :[akka.tcp://sparkDriver@CLIENT_IP:51879]
>>
>> 14/10/15 15:18:27 INFO Remoting: Remoting now listens on addresses:
>> [akka.tcp://sparkDriver@CLIENT_IP:51879]
>>
>> 14/10/15 15:18:27 INFO Utils: Successfully started service 'sparkDriver'
>> on port 51879.
>>
>> 14/10/15 15:18:27 INFO SparkEnv: Registering MapOutputTracker
>>
>> 14/10/15 15:18:27 INFO SparkEnv: Registering BlockManagerMaster
>>
>> 14/10/15 15:18:27 INFO DiskBlockManager: Created local directory at
>> /tmp/spark-local-20141015151827-1a2e
>>
>> 14/10/15 15:18:27 INFO Utils: Successfully started service 'Connection
>> manager for block manager' on port 60963.
>>
>> 14/10/15 15:18:27 INFO ConnectionManager: Bound socket to port 60963 with
>> id = ConnectionManagerId(CLIENT_IP,60963)
>>
>> 14/10/15 15:18:27 INFO MemoryStore: MemoryStore started with capacity
>> 265.4 MB
>>
>> 14/10/15 15:18:27 INFO BlockManagerMaster: Trying to register BlockManager
>>
>> 14/10/15 15:18:27 INFO BlockManagerMasterActor: Registering block manager
>> CLIENT_IP:60963 with 265.4 MB RAM
>>
>> 14/10/15 15:18:27 INFO BlockManagerMaster: Registered BlockManager
>>
>> 14/10/15 15:18:27 INFO HttpFileServer: HTTP File server directory is
>> /tmp/spark-b032c76c-93e1-473e-802c-c55e12e85d41
>>
>> 14/10/15 15:18:27 INFO HttpServer: Starting HTTP Server
>>
>> 14/10/15 15:18:27 INFO Utils: Successfully started service 'HTTP file
>> server' on port 47989.
>>
>> 14/10/15 15:18:27 INFO Utils: Successfully started service 'SparkUI' on
>> port 4040.
>>
>> 14/10/15 15:18:27 INFO SparkUI: Started SparkUI at http://CLIENT_IP:4040
>>
>> 14/10/15 15:18:27 WARN NativeCodeLoader: Unable to load native-hadoop
>> library for your platform... using builtin-java classes where applicable
>>
>> I1015 15:18:28.524736  4748 sched.cpp:139] Version: 0.20.1
>>
>> I1015 15:18:28.527180  4750 sched.cpp:235] New master detected at
>> master@MESOS_MASTER_IP:5050
>>
>> I1015 15:18:28.527300  4750 sched.cpp:243] No credentials provided.
>> Attempting to register without authentication
>>
>>
>> --------------------------------------------------------------------------------------------------------------------------------
>>
>>
>>
>> Mesos master WARNING log:
>>
>> W1015 14:13:00.235213  1118 master.cpp:3452] Master returning resources
>> offered to framework 20141007-102213-343139338-5050-1037-3490 because the
>> framework has terminated or is inactive
>>
>> W1015 14:13:35.244055  1121 master.cpp:3452] Master returning resources
>> offered to framework 20141007-102213-343139338-5050-1037-3525 because the
>> framework has terminated or is inactive
>>
>> W1015 14:13:50.252436  1121 master.cpp:3452] Master returning resources
>> offered to framework 20141007-102213-343139338-5050-1037-3540 because the
>> framework has terminated or is inactive
>>
>> W1015 14:14:05.252708  1117 master.cpp:3452] Master returning resources
>> offered to framework 20141007-102213-343139338-5050-1037-3555 because the
>> framework has terminated or is inactive
>>
>>
>>
>>
>>
>> Mesos slave WARNING log :
>>
>>
>>
>> W1015 13:58:19.103196  1211 slave.cpp:1421] Cannot shut down unknown
>> framework 20141007-102213-343139338-5050-1037-3116
>>
>> W1015 13:58:20.104650  1210 slave.cpp:1421] Cannot shut down unknown
>> framework 20141007-102213-343139338-5050-1037-3117
>>
>> W1015 13:58:21.119839  1211 slave.cpp:1421] Cannot shut down unknown
>> framework 20141007-102213-343139338-5050-1037-3118
>>
>> W1015 13:58:22.115965  1210 slave.cpp:1421] Cannot shut down unknown
>> framework 20141007-102213-343139338-5050-1037-3119
>>
>> W1015 13:58:23.104925  1211 slave.cpp:1421] Cannot shut down unknown
>> framework 20141007-102213-343139338-5050-1037-3120
>>
>> W1015 13:58:24.104652  1210 slave.cpp:1421] Cannot shut down unknown
>> framework 20141007-102213-343139338-5050-1037-3121
>>
>> W1015 13:58:59.853744  1212 slave.cpp:1421] Cannot shut down unknown
>> framework 20141007-102213-343139338-5050-1037-3122
>>
>> W1015 13:59:00.853086  1214 slave.cpp:1421] Cannot shut down unknown
>> framework 20141007-102213-343139338-5050-1037-3123
>>
>> W1015 13:59:01.853137  1212 slave.cpp:1421] Cannot shut down unknown
>> framework 20141007-102213-343139338-5050-1037-3124
>>
>> W1015 13:59:03.318259  1214 slave.cpp:1421] Cannot shut down unknown
>> framework 20141007-102213-343139338-5050-1037-3029
>>
>>
>>
>>
>>
>> I hope this information helps, please ask if you have any more questions
>> and thank you for your help!
>>
>>
>>
>> Johannes
>>
>>
>>
>> *From:* Tim St Clair [mailto:tstclair@redhat.com]
>> *Sent:* Mittwoch, 15. Oktober 2014 15:11
>> *To:* user@mesos.apache.org
>> *Subject:* Re: Connecting spark from a different Machine to mesos cluster
>>
>>
>>
>> Details?
>>
>>
>>
>> 1. What versions are you running?
>>
>> 2. Fine grained mode or Course Gained?
>>
>> 3. Are you running in VM's?
>>
>>
>>
>> Logs always help too.
>>
>>
>>
>> Cheers,
>>
>> Tim
>>
>>
>>  ------------------------------
>>
>> *From: *"Johannes Schillinger (Intern)" <johannes.schillinger@citrix.com>
>> *To: *user@mesos.apache.org
>> *Sent: *Wednesday, October 15, 2014 7:42:36 AM
>> *Subject: *Connecting spark from a different Machine to mesos cluster
>>
>>
>>
>> Hi,
>>
>>
>>
>> we are currently trying to get a mesos cluster running as a base for
>> Spark.
>>
>>
>>
>> The mesos cluster itself runs and connecting a spark shell from the
>> machine the maser runs on works perfectly.
>>
>> We can see the Framework being started and the slaves working.
>>
>>
>>
>> If we try to connect the exact same shell from a different machine to the
>> exact same cluster the screen stops at
>>
>>
>>
>> … 4013 sched.cpp:243] No credentials provided. Attempting to register
>> without authentication
>>
>>
>>
>> The cluster spins up a framework every two seconds with a new ID and
>> stops it immediately. This continues (we stopped it after a few dozen
>> starts).
>>
>>
>>
>> We can see the frameworks being started in the master- and slave-logs as
>> well as the command of the master to terminate it.
>>
>>
>>
>> Has anyone ever encountered a similar problem or has any advice on
>> solving this problem?
>>
>>
>>
>> Thanks!
>>
>> Johannes
>>
>>
>>
>>
>>
>> --
>>
>> Cheers,
>> Timothy St. Clair
>> Red Hat Inc.
>>
>
>
>
> Brian Devins* |* Java Developer
> Brian.Devins@dealer.com
>
> [image: Dealer.com]
>
>

Mime
View raw message