mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avinash Sridharan <avin...@mesosphere.io>
Subject Re: The issue of "Failed to shutdown socket with fd xx: Transport endpoint is not connected" on Mesos master
Date Tue, 29 Dec 2015 19:22:36 GMT
lsof command will show only actively opened file descriptors. So if you ran
the command after seeing the error logs in the master, most probably the
master had already closed this fd. Just throwing a few other things to look
at, that might give some more insights.

* Run the "netstat -na" and netstat -nt" commands on the master and the
kubernetes master node to make sure that the master is listening to the
right port, and the k8s scheduler is trying to connect to the right port.
>From the logs it does look like the master is receiving the registration
request, so there shouldn't be a network configuration issue here.
* Make sure there are no firewall rules getting turned on in your cluster
since it looks like the k8s scheduler is not able to connect to the master
(though it was able to register the first time).

On Tue, Dec 29, 2015 at 1:37 AM, Nan Xiao <xiaonan830818@gmail.com> wrote:

> BTW, using "lsof" command finds there are only 16 file descriptors. I
> don't know why Mesos
> master try to close "fd 17".
> Best Regards
> Nan Xiao
>
>
> On Tue, Dec 29, 2015 at 11:32 AM, Nan Xiao <xiaonan830818@gmail.com>
> wrote:
> > Hi Klaus,
> >
> > Firstly, thanks very much for your answer!
> >
> > The km processes are all live:
> > root     129474 128024  2 22:26 pts/0    00:00:00 km apiserver
> > --address=15.242.100.60 --etcd-servers=http://15.242.100.60:4001
> > --service-cluster-ip-range=10.10.10.0/24 --port=8888
> > --cloud-provider=mesos --cloud-config=mesos-cloud.conf --secure-port=0
> > --v=1
> > root     129509 128024  2 22:26 pts/0    00:00:00 km
> > controller-manager --master=15.242.100.60:8888 --cloud-provider=mesos
> > --cloud-config=./mesos-cloud.conf --v=1
> > root     129538 128024  0 22:26 pts/0    00:00:00 km scheduler
> > --address=15.242.100.60 --mesos-master=15.242.100.56:5050
> > --etcd-servers=http://15.242.100.60:4001 --mesos-user=root
> > --api-servers=15.242.100.60:8888 --cluster-dns=10.10.10.10
> > --cluster-domain=cluster.local --v=2
> >
> > All the logs are also seem OK, except the logs from scheduler.log:
> > ......
> > I1228 22:26:37.883092  129538 messenger.go:381] Receiving message
> > mesos.internal.InternalMasterChangeDetected from
> > scheduler(1)@15.242.100.60:33077
> > I1228 22:26:37.883225  129538 scheduler.go:374] New master
> > master@15.242.100.56:5050 detected
> > I1228 22:26:37.883268  129538 scheduler.go:435] No credentials were
> > provided. Attempting to register scheduler without authentication.
> > I1228 22:26:37.883356  129538 scheduler.go:928] Registering with
> > master: master@15.242.100.56:5050
> > I1228 22:26:37.883460  129538 messenger.go:187] Sending message
> > mesos.internal.RegisterFrameworkMessage to master@15.242.100.56:5050
> > I1228 22:26:37.883504  129538 scheduler.go:881] will retry
> > registration in 1.209320575s if necessary
> > I1228 22:26:37.883758  129538 http_transporter.go:193] Sending message
> > to master@15.242.100.56:5050 via http
> > I1228 22:26:37.883873  129538 http_transporter.go:587] libproc target
> > URL
> http://15.242.100.56:5050/master/mesos.internal.RegisterFrameworkMessage
> > I1228 22:26:39.093560  129538 scheduler.go:928] Registering with
> > master: master@15.242.100.56:5050
> > I1228 22:26:39.093659  129538 messenger.go:187] Sending message
> > mesos.internal.RegisterFrameworkMessage to master@15.242.100.56:5050
> > I1228 22:26:39.093702  129538 scheduler.go:881] will retry
> > registration in 3.762036352s if necessary
> > I1228 22:26:39.093765  129538 http_transporter.go:193] Sending message
> > to master@15.242.100.56:5050 via http
> > I1228 22:26:39.093847  129538 http_transporter.go:587] libproc target
> > URL
> http://15.242.100.56:5050/master/mesos.internal.RegisterFrameworkMessage
> > ......
> >
> > From the log, the Mesos master rejected the k8s's registeration, and
> > k8s retry constantly.
> >
> > Have you met this issue before? Thanks very much in advance!
> > Best Regards
> > Nan Xiao
> >
> >
> > On Mon, Dec 28, 2015 at 7:26 PM, Klaus Ma <klaus1982.cn@gmail.com>
> wrote:
> >> It seems Kubernetes is down; would you help to check kubernetes's status
> >> (km)?
> >>
> >> ----
> >> Da (Klaus), Ma (马达) | PMP® | Advisory Software Engineer
> >> Platform Symphony/DCOS Development & Support, STG, IBM GCG
> >> +86-10-8245 4084 | klaus1982.cn@gmail.com | http://k82.me
> >>
> >> On Mon, Dec 28, 2015 at 6:35 PM, Nan Xiao <xiaonan830818@gmail.com>
> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> Greetings from me!
> >>>
> >>> I am trying to follow this tutorial
> >>>
> >>> (
> https://github.com/kubernetes/kubernetes/blob/master/docs/getting-started-guides/mesos.md
> )
> >>> to deploy "k8s on Mesos" on local machines: The k8s is the newest
> >>> master branch, and Mesos is the 0.26 edition.
> >>>
> >>> After running Mesos master(IP:15.242.100.56), Mesos
> >>> slave(IP:15.242.100.16),, and the k8s(IP:15.242.100.60), I can see the
> >>> following logs from Mesos master:
> >>>
> >>> ......
> >>> I1227 22:52:34.494478  8069 master.cpp:4269] Received update of slave
> >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-S0 at slave(1)@15.242.100.16:5051
> >>> (pqsfc016.ftc.rdlabs.hpecorp.net) with total oversubscribed resources
> >>> I1227 22:52:34.494940  8065 hierarchical.cpp:400] Slave
> >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-S0
> >>> (pqsfc016.ftc.rdlabs.hpecorp.net) updated with oversubscribed
> >>> resources  (total: cpus(*):32; mem(*):127878; disk(*):4336;
> >>> ports(*):[31000-32000], allocated: )
> >>> I1227 22:53:06.740757 8053 http.cpp:334] HTTP GET for
> >>> /master/state.json from 15.242.100.60:56219 with
> >>> User-Agent='Go-http-client/1.1'
> >>> I1227 22:53:07.736419 8065 http.cpp:334] HTTP GET for
> >>> /master/state.json from 15.242.100.60:56241 with
> >>> User-Agent='Go-http-client/1.1'
> >>> I1227 22:53:07.767196  8070 http.cpp:334] HTTP GET for
> >>> /master/state.json from 15.242.100.60:56252 with
> >>> User-Agent='Go-http-client/1.1'
> >>> I1227 22:53:08.808171  8053 http.cpp:334] HTTP GET for
> >>> /master/state.json from 15.242.100.60:56272 with
> >>> User-Agent='Go-http-client/1.1'
> >>> I1227 22:53:08.815811 8060 master.cpp:2176] Received SUBSCRIBE call
> >>> for framework 'Kubernetes' at scheduler(1)@15.242.100.60:59488
> >>> I1227 22:53:08.816182 8060 master.cpp:2247] Subscribing framework
> >>> Kubernetes with checkpointing enabled and capabilities [  ]
> >>> I1227 22:53:08.817294  8052 hierarchical.cpp:195] Added framework
> >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-0000
> >>> I1227 22:53:08.817464  8050 master.cpp:1122] Framework
> >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-0000 (Kubernetes) at
> >>> scheduler(1)@15.242.100.60:59488 disconnected
> >>> E1227 22:53:08.817497 8073 process.cpp:1911] Failed to shutdown
> >>> socket with fd 17: Transport endpoint is not connected
> >>> I1227 22:53:08.817533  8050 master.cpp:2472] Disconnecting framework
> >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-0000 (Kubernetes) at
> >>> scheduler(1)@15.242.100.60:59488
> >>> I1227 22:53:08.817595 8050 master.cpp:2496] Deactivating framework
> >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-0000 (Kubernetes) at
> >>> scheduler(1)@15.242.100.60:59488
> >>> I1227 22:53:08.817797 8050 master.cpp:1146] Giving framework
> >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-0000 (Kubernetes) at
> >>> scheduler(1)@15.242.100.60:59488 7625.14222623576weeks to failover
> >>> W1227 22:53:08.818389 8062 master.cpp:4840] Master returning
> >>> resources offered to framework
> >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-0000 because the framework has
> >>> terminated or is inactive
> >>> I1227 22:53:08.818397  8052 hierarchical.cpp:273] Deactivated
> >>> framework 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-0000
> >>> I1227 22:53:08.819046  8066 hierarchical.cpp:744] Recovered
> >>> cpus(*):32; mem(*):127878; disk(*):4336; ports(*):[31000-32000]
> >>> (total: cpus(*):32; mem(*):127878; disk(*):4336;
> >>> ports(*):[31000-32000], allocated: ) on slave
> >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-S0 from framework
> >>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-0000
> >>> ......
> >>>
> >>> I can't figure out why Mesos master complains "Failed to shutdown
> >>> socket with fd 17: Transport endpoint is not connected".
> >>> Could someone give some clues on this issue?
> >>>
> >>> Thanks very much in advance!
> >>>
> >>> Best Regards
> >>> Nan Xiao
> >>
> >>
>



-- 
Avinash Sridharan, Mesosphere
+1 (323) 702 5245

Mime
View raw message