singa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SINGA-201) Error while running singa on mesos in fully distributed mode
Date Thu, 23 Jun 2016 08:33:16 GMT

    [ https://issues.apache.org/jira/browse/SINGA-201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15346050#comment-15346050
] 

ASF subversion and git services commented on SINGA-201:
-------------------------------------------------------

Commit 1ca8c638b132009e213fda8e02e77cc2d09fb824 in incubator-singa's branch refs/heads/master
from [~ug93tad]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-singa.git;h=1ca8c63 ]

SINGA-201 Error when running Mesos

A bug was reported (https://issues.apache.org/jira/browse/SINGA-201) when
launching SINGA on Mesos in fully distributed mode.

The main cause was determined to be of ZeroMQ binding to the localhost. In fully
distributed mode, SINGA on each node should be passed a `-host` flag specifying
the public IP address of the local host.

The Mesos scheduler is modified accordingly:

1. When a Mesos slave starts connecting to the master, it passes `--hostname` flag specifying
its public IP address

2. The scheduler now sends to each executor command of the form:

          `singa -conf ./job.conf -singa_conf ./singa.conf -singa_job XX -host XX`


> Error while running singa on mesos in fully distributed mode
> ------------------------------------------------------------
>
>                 Key: SINGA-201
>                 URL: https://issues.apache.org/jira/browse/SINGA-201
>             Project: Singa
>          Issue Type: Bug
>         Environment: Linux 
>            Reporter: Venkata Satish Katta
>            Assignee: Anh Dinh
>            Priority: Blocker
>              Labels: mesos, singa
>
> Log file created at: 2016/06/17 10:00:43
> Running on machine: ip-172-31-52-12
> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
> I0617 10:00:43.202184  2751 zk_service.cc:215] GLOBAL_WATCHER connected to zookeeper
successfully!
> W0617 10:00:43.203711  2742 zk_service.cc:109] zookeeper node /singa already exists
> W0617 10:00:43.205016  2742 zk_service.cc:109] zookeeper node /singa/app already exists
> W0617 10:00:43.206166  2742 zk_service.cc:109] zookeeper node /singa/app/job-0000000017
already exists
> W0617 10:00:43.207147  2742 zk_service.cc:109] zookeeper node /singa/app/job-0000000017/group
already exists
> W0617 10:00:43.208237  2742 zk_service.cc:109] zookeeper node /singa/app/job-0000000017/proc
already exists
> W0617 10:00:43.209300  2742 zk_service.cc:109] zookeeper node /singa/app/job-0000000017/proc-lock
already exists
> F0617 10:00:43.862246  2742 socket.cc:98] Check failed: port != -1 (-1 vs. -1) tcp://localhost:*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message