mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 木内満歳 <m-kiu...@creationline.com>
Subject Re: Spark task sometimes won't start
Date Tue, 24 Nov 2015 06:18:50 GMT
Hi, Tim

I've reproduced and taken debug logs(attached).
I cannot understand what is going on, but it seems that the slave is
repeatedly sending ACCEPT message to master.

Please have your comment.

Best Regards,
Mitsutoshi Kiuchi


2015-11-24 5:28 GMT+09:00 Tim Chen <tim@mesosphere.io>:

> Hi Mitsutoshi,
>
> Can you enable TRACING log on Spark (modify your log4j.properties file)?
>
> It should have more information on why offers are being rejected, but most
> of the time it's due to not enough resources in your cluster to satifsy
> launching your Spark job. You can either increase your slave(s) resources
> or lower your cpu/memory requirement for your job through configuration.
>
> Tim
>
> On Mon, Nov 23, 2015 at 6:30 AM, 木内満歳 <m-kiuchi@creationline.com> wrote:
>
>> Hi,
>>
>> I'm experiencing that some spark task on Mesos 0.25 occasionally won't
>> start.
>> Please tell some advice how to see more detail against it.
>>
>> Here is the slave log about bad task
>>
>> Nov 23 08:54:26 mesos-s2 mesos-slave[18499]: I1123 08:54:26.677291 18516
>> slave.cpp:2379] Got registration for executor
>> '235498ca-6603-4cfe-bfc7-94005bb235fb-S5' of framework
>> 235498ca-6603-4cfe-bfc7-94005bb235fb-1442 from executor(1)@
>> 10.130.91.16:60295
>> Nov 23 08:54:26 mesos-s2 mesos-slave[18499]: I1123 08:54:26.679875 18516
>> slave.cpp:1760] Sending queued task '0' to executor
>> '235498ca-6603-4cfe-bfc7-94005bb235fb-S5' of framework
>> 235498ca-6603-4cfe-bfc7-94005bb235fb-1442
>> (no more log about this task)
>>
>> When task succeed to run, slave log shows like that.
>>
>> Nov 23 08:44:39 al-mesos-s3 mesos-slave[8644]: I1123 08:44:39.637285
>> 8658 slave.cpp:2379] Got registration for executor
>> '235498ca-6603-4cfe-bfc7-94005bb235fb-S6' of framework
>> 235498ca-6603-4cfe-bfc7-94005bb235fb-1437 from executor(1)@
>> 10.130.98.65:52273
>> Nov 23 08:44:39 al-mesos-s3 mesos-slave[8644]: I1123 08:44:39.639233
>> 8658 slave.cpp:1760] Sending queued task '6' to executor
>> '235498ca-6603-4cfe-bfc7-94005bb235fb-S6' of framework
>> 235498ca-6603-4cfe-bfc7-94005bb235fb-1437
>> Nov 23 08:44:42 al-mesos-s3 mesos-slave[8644]: I1123 08:44:42.608182
>> 8658 slave.cpp:2717] Handling status update TASK_RUNNING (UUID:
>> ff5a2278-0753-4541-bd33-a55f3a09fb69) for task 6 of framework
>> 235498ca-6603-4cfe-bfc7-94005bb235fb-1437 from executor(1)@
>> 10.130.98.65:52273
>> Nov 23 08:44:42 al-mesos-s3 mesos-slave[8644]: I1123 08:44:42.612318
>> 8658 status_update_manager.cpp:322] Received status update TASK_RUNNING
>> (UUID: ff5a2278-0753-4541-bd33-a55f3a09fb69) for task 6 of framework
>> 235498ca-6603-4cfe-bfc7-94005bb235fb-1437
>>
>> Any advice is welcome.
>>
>> Best Regards,
>> Mitsutoshi Kiuchi
>>
>>
>

Mime
View raw message