mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jaana Miettinen" <jaa...@kolumbus.fi>
Subject VS: framework failover
Date Wed, 09 Nov 2016 17:05:56 GMT
Hi,

 

Let’s turn our discussion back to the original problem now.

 

In the beginning Cassandra framework was running on mesos-agent-2 also etcd framework was
running there.

 



 

I issued ‘halt’-command from Linux command line Tue Nov  8 16:57:25 EET 2016 on mesos-agent-2

For the next ~14 minutes mesos GUI was showing that Cassandra consumes all resources on mesos-sgent:

 



During that time mesos-master was not sending resource offers to any framework, which can
be easily seen for the attached log file.

 

So, according to my understanding mesos should not assume all resources consumed by Cassandra
framework. Should I create an issue for mesos ?

 

Br, Jaana  

 

 

Lähettäjä: Joseph Wu [mailto:joseph@mesosphere.io] 
Lähetetty: 8. marraskuuta 2016 21:34
Vastaanottaja: user <user@mesos.apache.org>
Aihe: Re: framework failover

 

I'm guessing your definition of Cassandra being "up" is slightly different than what I'm thinking:

A framework is "up" as long as it is registered with the master.  This should happen immediately
upon the framework starting.  You can have a cluster with no agents, and the framework will
still be "up".  (The framework will merely be unable to do anything without any agents.)

 

I'm guessing the 12-14 minute recovery is due to two causes:

*	The framework itself takes a non-negligible amount of time to download and initialize. 
This accounts for some of the variation in startup time.  You can find the actual amount of
time for this by looking at the timestamps in the Cassandra framework sandbox logs (the one
launched with Marathon).
*	If you consider the framework to be "up" only after it has finished launching its tasks,
then the remaining startup time is accounted for due to:

*	The amount of time it takes to download and initialize the tasks.
*	The amount of time it takes for the master to realize the agent is gone (75 seconds by default)

If you want faster recovery, you'll need to run Cassandra in "high availability" mode, if
the framework supports it.

 

On Tue, Nov 8, 2016 at 10:59 AM, Jaana Miettinen <jaanam@kolumbus.fi <mailto:jaanam@kolumbus.fi>
> wrote:

Hi, pls see my answers below

 

*  Is the timing of restarting the Cassandra framework consistent? 

 

Hopefully I understand your question correctly. At least each time I repeat this problem it
takes 14 minutes for Cassandra to come up. Sometimes this may have been 12 minutes – I haven’t
checked so exactly every time. But never less than 12 minutes !

 

*  The Cassandra framework is most likely using the old-style scheduler driver, which connects
to the Mesos master via a plain TCP socket. 

Makes sense, but I guess that every framework in my environment are using similar scheduler
drivers, but Mesos master detects breakdown of those sockets immediately:

 

#cat /var/log/mesos/ mesos-master.ERROR

E1104 08:53:38.611367 15505 process.cpp:1958] Failed to shutdown socket with fd 38: Transport
endpoint is not connected 

E1104 08:54:56.627190 15505 process.cpp:1958] Failed to shutdown socket with fd 44: Transport
endpoint is not connected

E1104 08:54:56.627286 15505 process.cpp:1958] Failed to shutdown socket with fd 38: Transport
endpoint is not connected

E1104 08:56:00.941144 15505 process.cpp:1958] Failed to shutdown socket with fd 29: Transport
endpoint is not connected

E1104 08:57:00.845110 15505 process.cpp:1958] Failed to shutdown socket with fd 32: Transport
endpoint is not connected // ~here mesos-master disconnects all other frameworks that were
running on the turned off agent

E1104 09:09:09.933151 15505 process.cpp:1958] Failed to shutdown socket with fd 35: Transport
endpoint is not connected

E1104 09:09:12.939226 15505 process.cpp:1958] Failed to shutdown socket with fd 32: Transport
endpoint is not connected //~here the recovery starts

 

So, why does this fd 32 (if it was cassandra’s connect point) behave in a different way
than the others ? I don’t always see the same fd failed twice like in this time, but I see
this kind of socket error message soon after shutdown of the slave (for other than Cassandra
frameworks) and then after about 12 minutes the last one (assumed for Cassandra)  

 

*  The configuration options for agent failover are these:

 <https://github.com/apache/mesos/blob/1.0.x/src/master/flags.cpp#L127-L165> https://github.com/apache/mesos/blob/1.0.x/src/master/flags.cpp#L127-L165

 <https://github.com/apache/mesos/blob/1.0.x/src/master/flags.cpp#L469-L500> https://github.com/apache/mesos/blob/1.0.x/src/master/flags.cpp#L469-L500

 


--agent_reregister_timeout=VALUE 

--slave_reregister_timeout=VALUE 

The timeout within which all agents are expected to re-register when a new master is elected
as the leader. Agents that do not re-register within the timeout will be removed from the
registry and will be shutdown if they attempt to communicate with master. NOTE: This value
has to be at least 10mins. (default: 10mins) 

 

At 1st    the documentation states that this shouldn’t be less than 10mins.=> Decreasing
of this this timer shouldn’t be the solution. (And actually cannot be, because mesos-master
failed to start up when I configured this value less than 10 mins)

At 2nd      this timer was for the master failover not for the agent failure (when a new master
is elected as the leader)

At 3nd      this timer doesn’t affect the issue at all: As I was unable to decrease it I
checked what will happen if I increase it. So I tried with 20mins => no affect.

 

*  Marathon has health checks.  You should use them.

Some of our frameworks use them. Some do not use. This cassandra issue we have tried both
with and without marathon health checks. No affect.  

 

*  If you lose the Mesos agent forever, the master tells Marathon that tasks are lost, but
not the corresponding frameworks.

This is interesting, so the loss off of framework in the failed agent can be concluded only
from the behavior of the socket, right ? 

 

Thanks for your answers,

Jaana

 

Lähettäjä: Joseph Wu [mailto:joseph@mesosphere.io <mailto:joseph@mesosphere.io> ]

Lähetetty: 8. marraskuuta 2016 1:52


Vastaanottaja: user <user@mesos.apache.org <mailto:user@mesos.apache.org> >
Aihe: Re: framework failover

 

Is the timing of restarting the Cassandra framework consistent?  

The Cassandra framework is most likely using the old-style scheduler driver, which connects
to the Mesos master via a plain TCP socket.  If you turn off the node, there is no guarantee
that the TCP socket actually breaks (the kernel on the turned off node is responsible for
doing this).  If the socket becomes stale, the Mesos master will only detect the stale socket
if it attempts to send something (usually status updates and offers).

-----

The configuration options for agent failover are these:
 <https://github.com/apache/mesos/blob/1.0.x/src/master/flags.cpp#L127-L165> https://github.com/apache/mesos/blob/1.0.x/src/master/flags.cpp#L127-L165
 <https://github.com/apache/mesos/blob/1.0.x/src/master/flags.cpp#L469-L500> https://github.com/apache/mesos/blob/1.0.x/src/master/flags.cpp#L469-L500

Because you are using Marathon to launch your framework(s), there are a few things you should
be aware of:

*	Marathon has health checks.  You should use them.
*	Losing a task is different than losing a framework, even if you launch the framework as
a task.

*	You can lose the framework if it disconnects for too long.  But the task might still be
running.
*	You can lose the task, but the framework might still be registered.  This depends on the
framework, which tells the master how long it is allowed to be disconnected before the master
considers the framework "complete".
*	If you lose the Mesos agent forever, the master tells Marathon that tasks are lost, but
not the corresponding frameworks.

 

On Mon, Nov 7, 2016 at 11:08 AM, Jaana Miettinen < <mailto:jaanam@kolumbus.fi> jaanam@kolumbus.fi>
wrote:

HI, Still one question: during the next 14 minutes of the agent shutdown mesos GUI is still
showing that Cassandra consumes 99% of all resources on the agent that went down. Even if
it were originally Cassandra bug or misconfiguration that led to this situation – isn’t
it still a bug in mesos that it is showing that kind of consumtion in the agent that doesn’t
exist any more ?

 

Jaana  

 

Lähettäjä: Jaana Miettinen [mailto: <mailto:jaanam@kolumbus.fi> jaanam@kolumbus.fi]

Lähetetty: 5. marraskuuta 2016 16:43
Vastaanottaja: ' <mailto:user@mesos.apache.org> user@mesos.apache.org' < <mailto:user@mesos.apache.org>
user@mesos.apache.org>
Aihe: VS: framework failover

 

HI, Thanks for your quick reply, pls see my answers below

 

*  Are you running your frameworks via Marathon?

yes

 

*  How are you terminating the Mesos Agent? 

So far I have been just issuing Linux ‘halt’-command from the agent’s command line or
terminating the agent instance from the cloud management console. So actually I want to simulate
the case when the whole host, where my framework is running, goes down.

 

 

*  Implies that the master does not remove the agent immediately, meaning you killed the agent,
but did not kill the tasks.  
During this time, the master is waiting for the agent to come back online.  If the agent doesn't
come back during some (configurable) timeout, it will notify the frameworks about the loss
of an agent. 

Sounds like you would be talking about this timer ‘ALLOCATION_HOLD_OFF_RECOVERY_TIMEOUT’
that has been hardcoded to 10 minutes in mesos.0.28.0.

 

But now we are reaching the most interesting question in our discussion. You wrote: If the
agent doesn't come back during some (configurable) timeout, it will notify the frameworks
about the loss of an agent. 

How could this happen if the framework was just running in the agent that went down ? Or do
you mean the frameworks running on other agents would get the information about the loss of
the agent ?

 

*  Also, it's a little odd that your frameworks will disconnect upon the agent process dying.
 You may want to investigate your framework dependencies.  A framework should definitely not
depend on the agent process (frameworks depend on the master though).

*   

For me it looks very natural that the frameworks disconnect when the agent host shuts down.
And if Cassandra wouldn’t be there and consuming all resources then the other frameworks
would re-register and continue running their tasks on the other agents. Wouldn’t this be
the correct procedure ?

 

Hopefully I answered your questions clearly enough. Anyway, please let me know which configurable
timer you were talking about !

 

And thanks a lot,

 

Jaana

 

 

BTW. if ALLOCATION_HOLD_OFF_RECOVERY_TIMEOUT were the correct guess then I should see "Triggered
allocator recovery: waiting for " in my log-file  <http://mesos.master.INFO> mesos.master.INFO.
But it’s not there.

 

// Setup recovery timer.

  delay(ALLOCATION_HOLD_OFF_RECOVERY_TIMEOUT, self(), &Self::resume);

 

  // NOTE: `quotaRoleSorter` is updated implicitly in `setQuota()`.

  foreachpair (const string& role, const Quota& quota, quotas) {

    setQuota(role, quota);

  }

 

  LOG(INFO) << "Triggered allocator recovery: waiting for "

            << expectedAgentCount.get() << " slaves to reconnect or "

            << ALLOCATION_HOLD_OFF_RECOVERY_TIMEOUT << " to pass";

}

 

 

 

Lähettäjä: Joseph Wu [ <mailto:joseph@mesosphere.io> mailto:joseph@mesosphere.io]

Lähetetty: 4. marraskuuta 2016 20:03
Vastaanottaja: user < <mailto:user@mesos.apache.org> user@mesos.apache.org>
Aihe: Re: framework failover

 

A couple questions/notes:



What do you mean by:

the system will deploy the framework on a new node within less than three minutes.

Are you running your frameworks via Marathon?

 

How are you terminating the Mesos Agent?  If you send a `kill -SIGUSR1`, the agent will immediately
kill all of its tasks and un-register with the master.

If you kill the agent with some other signal, the agent will simply stop, but tasks will continue
to run.

According to the mesos GUI page cassandra holds 99-100 % of the resources on the terminated
slave during that 14 minutes.

^ Implies that the master does not remove the agent immediately, meaning you killed the agent,
but did not kill the tasks.  
During this time, the master is waiting for the agent to come back online.  If the agent doesn't
come back during some (configurable) timeout, it will notify the frameworks about the loss
of an agent. 

Also, it's a little odd that your frameworks will disconnect upon the agent process dying.
 You may want to investigate your framework dependencies.  A framework should definitely not
depend on the agent process (frameworks depend on the master though).

 

 

On Fri, Nov 4, 2016 at 10:32 AM, Jaana Miettinen < <mailto:jaanam@kolumbus.fi> jaanam@kolumbus.fi>
wrote:

Hi, Would you help me to find out how the framework failover happens in mesos 0.28.0 ? 

 

In my mesos-environment I have the following  frameworks: 

 

etcd-mesos

cassandra-mesos 0.2.0-1

eremitic

marathon 0.15.2

 

If I shutdown the agent (mesos-slave) in which my framework has been deployed from the Linux
command-line by ‘halt’-command, the sytem will deploy the framework on a new node within
less than three minutes.

 

But when I shut down the agent in which cassandra framework is running it takes 14 minutes
before the system recovers. 

 

According to the mesos GUI page cassandra holds 99-100 % of the resources on the terminated
slave during that 14 minutes.

 

Seen from the mesos-log:

 

Line 976: I1104 08:53:29.516564 15502 master.cpp:1173] Slave c002796f-a98d-4e55-bee3-f51b8d06323b-S8
at slave(1)@ <http://10.254.69.140:5050> 10.254.69.140:5050 (mesos-slave-1) disconnected

                             Line 977: I1104 08:53:29.516644 15502 master.cpp:2586] Disconnecting
slave c002796f-a98d-4e55-bee3-f51b8d06323b-S8 at slave(1)@ <http://10.254.69.140:5050>
10.254.69.140:5050 (mesos-slave-1)

                             Line 1020: I1104 08:53:39.872681 15501 master.cpp:1212] Framework
c002796f-a98d-4e55-bee3-f51b8d06323b-0007 (Eremetic) at scheduler(1)@ <http://10.254.69.140:31570>
10.254.69.140:31570 disconnected

                             Line 1021: I1104 08:53:39.872707 15501 master.cpp:2527] Disconnecting
framework c002796f-a98d-4e55-bee3-f51b8d06323b-0007 (Eremetic) at scheduler(1)@ <http://10.254.69.140:31570>
10.254.69.140:31570

                             Line 1080: W1104 08:54:53.621151 15503 master.hpp:1764] Master
attempted to send message to disconnected framework c002796f-a98d-4e55-bee3-f51b8d06323b-0007
(Eremetic) at scheduler(1)@ <http://10.254.69.140:31570> 10.254.69.140:31570

                             Line 1083: W1104 08:54:53.621279 15503 master.hpp:1764] Master
attempted to send message to disconnected framework c002796f-a98d-4e55-bee3-f51b8d06323b-0004
(Eremetic) at scheduler(1)@ <http://10.254.74.77:31956> 10.254.74.77:31956

                             Line 1085: W1104 08:54:53.621354 15503 master.hpp:1764] Master
attempted to send message to disconnected framework c002796f-a98d-4e55-bee3-f51b8d06323b-0002
(Eremetic) at scheduler(1)@ <http://10.254.77.2:31460> 10.254.77.2:31460

                             Line 1219: I1104 09:09:09.933365 15502 master.cpp:1212] Framework
c002796f-a98d-4e55-bee3-f51b8d06323b-0005 (cassandra.ava) at  <mailto:scheduler-6849089f-1a44-4101-b5b7-0960da81b910@10.254.69.140:36495>
scheduler-6849089f-1a44-4101-b5b7-0960da81b910@10.254.69.140:36495 disconnected

                             Line 1220: I1104 09:09:09.933404 15502 master.cpp:2527] Disconnecting
framework c002796f-a98d-4e55-bee3-f51b8d06323b-0005 (cassandra.ava) at  <mailto:scheduler-6849089f-1a44-4101-b5b7-0960da81b910@10.254.69.140:36495>
scheduler-6849089f-1a44-4101-b5b7-0960da81b910@10.254.69.140:36495

                             Line 1222: W1104 09:09:09.933518 15502 master.hpp:1764] Master
attempted to send message to disconnected framework c002796f-a98d-4e55-bee3-f51b8d06323b-0005
(cassandra.ava) at  <mailto:scheduler-6849089f-1a44-4101-b5b7-0960da81b910@10.254.69.140:36495>
scheduler-6849089f-1a44-4101-b5b7-0960da81b910@10.254.69.140:36495

                             Line 1223: W1104 09:09:09.933697 15502 master.hpp:1764] Master
attempted to send message to disconnected framework c002796f-a98d-4e55-bee3-f51b8d06323b-0005
(cassandra.ava) at  <mailto:scheduler-6849089f-1a44-4101-b5b7-0960da81b910@10.254.69.140:36495>
scheduler-6849089f-1a44-4101-b5b7-0960da81b910@10.254.69.140:36495

                             Line 1224: W1104 09:09:09.933768 15502 master.hpp:1764] Master
attempted to send message to disconnected framework c002796f-a98d-4e55-bee3-f51b8d06323b-0005
(cassandra.ava) at  <mailto:scheduler-6849089f-1a44-4101-b5b7-0960da81b910@10.254.69.140:36495>
scheduler-6849089f-1a44-4101-b5b7-0960da81b910@10.254.69.140:36495

                             Line 1225: W1104 09:09:09.933825 15502 master.hpp:1764] Master
attempted to send message to disconnected framework c002796f-a98d-4e55-bee3-f51b8d06323b-0005
(cassandra.ava) at  <mailto:scheduler-6849089f-1a44-4101-b5b7-0960da81b910@10.254.69.140:36495>
scheduler-6849089f-1a44-4101-b5b7-0960da81b910@10.254.69.140:36495

 

E1104 08:53:38.611367 15505 process.cpp:1958] Failed to shutdown socket with fd 38: Transport
endpoint is not connected

E1104 08:54:56.627190 15505 process.cpp:1958] Failed to shutdown socket with fd 44: Transport
endpoint is not connected

E1104 08:54:56.627286 15505 process.cpp:1958] Failed to shutdown socket with fd 38: Transport
endpoint is not connected

E1104 08:56:00.941144 15505 process.cpp:1958] Failed to shutdown socket with fd 29: Transport
endpoint is not connected

E1104 08:57:00.845110 15505 process.cpp:1958] Failed to shutdown socket with fd 32: Transport
endpoint is not connected

E1104 09:09:09.933151 15505 process.cpp:1958] Failed to shutdown socket with fd 35: Transport
endpoint is not connected

E1104 09:09:12.939226 15505 process.cpp:1958] Failed to shutdown socket with fd 32: Transport
endpoint is not connected

   

So which message did mesos try to send Cassandra at 09:09:09.933518  ?  

 

And if mesos knew that cassandra framework was running on the failed node, why didn’t it
then disconnect it the same way as Eremetic was disconnected ?

 

I’ve also noticed that the recovery (=resource deallocation) starts after cassandra’s
disconnection and no resources are offered by mesos before that. That’s why I’m currently
most interested to understand which event invokes Cassandra disconnect at 09:09:09.933404.


 

Please ask for information when needed.

 

Thanks already in advance,

 

Jaana Miettinen 

 

 

 

 

 


Mime
View raw message