singa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anh Dinh <dinh...@comp.nus.edu.sg>
Subject Re: Error while running singa on mesos
Date Wed, 22 Jun 2016 06:34:56 GMT
what version of Docker are you running?

Anh.


On 22 June 2016 at 14:26, Wang Wei <wangwei@apache.org> wrote:

>
> ---------- Forwarded message ----------
> From: Venkat Katta <skatta@adobe.com>
> Date: Wed, Jun 22, 2016 at 1:31 PM
> Subject: Re: Error while running singa on mesos
> To: Wang Wei <wangwei@apache.org>
>
>
> It works fine if I replace the node0 and node2 with their IP address. I am
> using weave for transparent communication between the containers.  In
> singa.conf to connect to zookeeper i used node0 but not the ipaddress of
> node0 it is able to connect why can't singa resolve the hostname. And while
> running singa with mesos it is using localhost rather ip address node1 and
> node2, also we are not giving any arguement while running the singa
>  regarding ip address of the slaves.
>
>
> F0622 05:18:28.932391  1513 socket.cc:98] Check failed: port != -1 (-1 vs.
> -1) tcp://localhost:*
>
>
> Thanks,
>
> Venkat satish katta
> ------------------------------
> *From:* Wang Wei <wangwei@apache.org>
> *Sent:* Wednesday, June 22, 2016 8:46:36 AM
> *To:* Venkat Katta
>
> *Subject:* Re: Error while running singa on mesos
>
> If you are using Docker (withou mesos), it could be the problem of network
> routing. May need to configure the Docker to setup the network then node0
> and node2 can be accessed from node1.
> We are trying your configuration.
>
> regards,
> wang wei
>
>
> On Wed, Jun 22, 2016 at 10:32 AM, Wang Wei <wangwei@apache.org> wrote:
>
>> Hi Venkat,
>>
>> It should be the problem of the node address.
>> Pls replace node0 and node2 with their IP addresses.
>>
>> regards,
>> wei
>>
>> On Wed, Jun 22, 2016 at 2:40 AM, Venkat Katta <skatta@adobe.com> wrote:
>>
>>> i tried running without mesos i got the same error
>>>
>>>
>>> root@node0:~/incubator-singa# ./bin/singa-run.sh -conf
>>> examples/cifar10/hybrid.conf
>>> Unique JOB_ID is 4
>>> Record job information to /tmp/singa-log/job-info/job-4-20160621-183305
>>> Executing @ node2 : cd /root/incubator-singa; source
>>> /root/incubator-singa/conf/profile; ./singa -singa_conf
>>> /root/incubator-singa/conf/singa.conf -singa_job 4 -conf
>>> /root/incubator-singa/examples/cifar10/hybrid.conf
>>> Executing @ node0 : cd /root/incubator-singa; source
>>> /root/incubator-singa/conf/profile; ./singa -singa_conf
>>> /root/incubator-singa/conf/singa.conf -singa_job 4 -conf
>>> /root/incubator-singa/examples/cifar10/hybrid.conf
>>> F0621 18:33:24.171468   725 socket.cc:98] Check failed: port != -1 (-1
>>> vs. -1) tcp://node2:*
>>> *** Check failure stack trace: ***
>>>     @     0x7f10d0a6b9fd  google::LogMessage::Fail()
>>>     @     0x7f10d0a6d89d  google::LogMessage::SendToLog()
>>>     @     0x7f10d0a6b5ec  google::LogMessage::Flush()
>>>     @     0x7f10d0a6e1be  google::LogMessageFatal::~LogMessageFatal()
>>>     @     0x7f10d0e05d79  singa::Router::Bind()
>>>     @     0x7f10d0d7a8bc  singa::Driver::Train()
>>>     @     0x7f10d0d7f48b  singa::Driver::Train()
>>>     @           0x40c915  main
>>>     @     0x7f10c5f13f45  (unknown)
>>>     @           0x40cb7e  (unknown)
>>> F0621 18:33:06.244278  1042 socket.cc:98] Check failed: port != -1 (-1
>>> vs. -1) tcp://node0:*
>>> *** Check failure stack trace: ***
>>>     @     0x7f6d4516d9fd  google::LogMessage::Fail()
>>>     @     0x7f6d4516f89d  google::LogMessage::SendToLog()
>>>     @     0x7f6d4516d5ec  google::LogMessage::Flush()
>>>     @     0x7f6d451701be  google::LogMessageFatal::~LogMessageFatal()
>>>     @     0x7f6d45507d79  singa::Router::Bind()
>>>     @     0x7f6d4547c8bc  singa::Driver::Train()
>>>     @     0x7f6d4548148b  singa::Driver::Train()
>>>     @           0x40c915  main
>>>     @     0x7f6d3a615f45  (unknown)
>>>     @           0x40cb7e  (unknown)
>>> bash: line 1:   725 Aborted                 (core dumped) ./singa
>>> -singa_conf /root/incubator-singa/conf/singa.conf -singa_job 4 -conf
>>> /root/incubator-singa/examples/cifar10/hybrid.conf -host node2
>>> bash: line 1:  1042 Aborted                 (core dumped) ./singa
>>> -singa_conf /root/incubator-singa/conf/singa.conf -singa_job 4 -conf
>>> /root/incubator-singa/examples/cifar10/hybrid.conf -host node0
>>> E0621 18:33:07.467438  1067 job_manager.cc:156] job 4 not exists
>>>
>>>
>>> ------------------------------
>>> *From:* Wang Wei <wangwei@apache.org>
>>> *Sent:* Tuesday, June 21, 2016 7:09:46 PM
>>> *To:* Venkat Katta
>>> *Cc:* dev@singa.incubator.apache.org
>>> *Subject:* Re: Error while running singa on mesos
>>>
>>> Hi,
>>>
>>> Can you try to run it without Mesos?
>>> 1. Compile singa with enable-dist
>>> 2. change conf/singa.conf to set the zookeeper host
>>> 3. update the conf/hostfile one line per machine
>>> 4. update the conf/profile to export LD_LIBRARY_PATH
>>>
>>> regards,
>>> Wei
>>>
>>> On Tue, Jun 21, 2016 at 8:52 PM, Venkat Katta <skatta@adobe.com> wrote:
>>>
>>>> Hi,
>>>>
>>>>
>>>> I am actually trying to run singa on mesos in fully distributed
>>>> architecture. I built the docker images as given in the documentation. I
am
>>>> using mesos 0.28.2 and singa 0.3-rc3.I am running each docker container
>>>> using --net=host flag so that they take the ip of the system. Singa works
>>>> as long as the workers are all in one machine .
>>>> When I try to use two machines for training it shows error
>>>>
>>>>
>>>> F0617 10:00:43.862246 2742 socket.cc:98] Check failed: port != -1 (-1
>>>> vs. -1) tcp://localhost:*
>>>>
>>>>
>>>>   so while running the scheduler do we need to give it hostfile
>>>> containing all the hosts. How does it know the remaining hosts in cluster.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>>
>>>> Venkat Satish Katta.
>>>>
>>>
>>>
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message