mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "craig mcmillan" <mccraigmccr...@gmail.com>
Subject Re: hadoop job stuck.
Date Fri, 16 Jan 2015 11:29:40 GMT
dan,

to function correctly the mesos web ui requires that the slave ip 
addresses the master uses are directly accessible from your browser to : 
it doesn't work if your slaves are not accessible to your browser on the 
same ip address that the master uses, such as happens when the master 
uses a private ip address for the slaves

you can get the slave logs directly by logging onto the slave and 
looking in /tmp/mesos/slaves/ (at least, this is where they are by 
default in the 0.21.0-1.0.ubuntu1404 i am using) then following the rest 
of the path from the url in the web ui

:craig


On 16 Jan 2015, at 11:20, Dick Davies wrote:

> To view the slaves logs, you need to be able to connect to that URL
> from your browser, not the master
> (the data is read directly from the slave by your browser, it doesn't
> go via the master).
>
>
> On 15 January 2015 at 21:42, Dan Dong <dongdan39@gmail.com> wrote:
>> Hi, All,
>> Now sandbox could be viewed on mesos UI, I see the following info( 
>> The
>> same error appears on every slave sandbox.):
>>
>> "Failed to connect to slave '20150115-144719-3205108908-5050-4552-S0' 
>> on
>> 'centos-2.local:5051'.
>>
>> Potential reasons:
>>
>> The slave's hostname, 'centos-2.local', is not accessible from your 
>> network
>> The slave's port, '5051', is not accessible from your network"
>>
>>
>> I checked that:
>> slave centos-2.local can be login from any machine in the cluster 
>> without
>> password by "ssh centos-2.local ";
>> port 5051 on slave centos-2.local could be connected from master by 
>> "telnet
>> centos-2.local 5051"
>>
>> Confused what's the problem here?
>>
>> Cheers,
>> Dan
>>
>>
>>
>> 2015-01-14 15:33 GMT-06:00 Brenden Matthews 
>> <brenden.matthews@airbnb.com>:
>>
>>> Would need the task logs from the slave which the TaskTracker was 
>>> launched
>>> on, to debug this further.
>>>
>>> On Wed, Jan 14, 2015 at 1:28 PM, Dan Dong <dongdan39@gmail.com> 
>>> wrote:
>>>>
>>>> Checked /etc/hosts is correct, master and slave can ssh login each 
>>>> other
>>>> by hostname without password, and hadoop runs well without mesos, 
>>>> but it
>>>> stucks when running on mesos.
>>>>
>>>> Cheers,
>>>> Dan
>>>>
>>>> 2015-01-14 15:02 GMT-06:00 Brenden Matthews
>>>> <brenden.matthews@airbnb.com>:
>>>>
>>>>> At a first glance, it looks like `/etc/hosts` might be set 
>>>>> incorrectly
>>>>> and it cannot resolve the hostname of the worker.
>>>>>
>>>>> See here for more: https://wiki.apache.org/hadoop/UnknownHost
>>>>>
>>>>> On Wed, Jan 14, 2015 at 12:32 PM, Vinod Kone 
>>>>> <vinodkone@apache.org>
>>>>> wrote:
>>>>>>
>>>>>> What do the master logs say?
>>>>>>
>>>>>> On Wed, Jan 14, 2015 at 12:21 PM, Dan Dong <dongdan39@gmail.com>

>>>>>> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>> When I run hadoop jobs on Mesos(0.21.0), the jobs are stuck for
>>>>>>> ever:
>>>>>>> 15/01/14 13:59:30 INFO mapred.FileInputFormat: Total input paths

>>>>>>> to
>>>>>>> process : 8
>>>>>>> 15/01/14 13:59:30 INFO mapred.JobClient: Running job:
>>>>>>> job_201501141358_0001
>>>>>>> 15/01/14 13:59:31 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>>>
>>>>>>> From jobtracker log I see:
>>>>>>> 2015-01-14 13:59:35,542 INFO 
>>>>>>> org.apache.hadoop.mapred.ResourcePolicy:
>>>>>>> Launching task Task_Tracker_0 on http://centos-2.local:31911

>>>>>>> with mapSlots=1
>>>>>>> reduceSlots=0
>>>>>>> 2015-01-14 14:04:35,552 WARN 
>>>>>>> org.apache.hadoop.mapred.MesosScheduler:
>>>>>>> Tracker http://centos-2.local:31911 failed to launch within 300

>>>>>>> seconds,
>>>>>>> killing it
>>>>>>>
>>>>>>> I started manually namenode and jobtracker on master node and
>>>>>>> datanode on slave, but I could not see tasktracker started by

>>>>>>> mesos on
>>>>>>> slave. Note that if I ran hadoop directly without Mesos( of 
>>>>>>> course the conf
>>>>>>> files are different and tasktracker will be started manually
on 
>>>>>>> slave),
>>>>>>> everything works fine. Any hints?
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Dan
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

Mime
View raw message