cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Farrington <rfarring...@remitdata.com>
Subject Re: Timeout with live migration
Date Tue, 13 Oct 2015 22:00:18 GMT
Issue #1) Terrible parameter names
https://issues.apache.org/jira/browse/CLOUDSTACK-8946

Issue #2) Wait value for MigrateVolume
https://issues.apache.org/jira/browse/CLOUDSTACK-8949



________________________________________
From: Rafael Weingärtner [rafaelweingartner@gmail.com]
Sent: Tuesday, October 13, 2015 3:43 PM
To: users@cloudstack.apache.org
Subject: [Questionable]  Re: Timeout with live migration

That is good. Now I can report to you what is in the code.

Let’s start:
First: when I looked at the problem at first time, I went straight to the
class that sends commands to Xen, and there the ACs uses a parameter
called: “migratewait” to control the timeout of command. You tried that,
and you still were getting the timeout problem.

That happened because despite that timeout, there is another point in ACS
that controls the timeout of command that are send to hypervisor (not just
Xen this time), and in that point, it is used a parameter called, “wait” as
a default value to control timeouts of command.

First conclusion, we have terrible parameter names ;)

Second, when we create a “MigrateVolumeCommand” we should set a timeout
value, this way the ACS would no use the default value of parameter “wait”.
That timeout value should be the same as the one used on CitrixResourceBase
and its children to control the migration of volumes.

Can you report what happened to you in a Jira ticket and add my comments
there?
I think next Saturday I can have someone working on that for the next ACS
release (4.7?), or even 4.6 if the PR gets accepted.

Please send me the jira ticket as soon as you open it.

On Tue, Oct 13, 2015 at 5:33 PM, Ryan Farrington <rfarrington@remitdata.com>
wrote:

> Looks like whatever change I made actually resulted in a change in
> behavior.  Prior to the change we were seeing a message every hour stating
> that the job agent was waiting now we see it waited 2hours and 8 minutes
> without a peep before finishing. So making the change to the "wait"
> parameter is what made the magic happen.
>
>
>
> 2015-10-13 13:21:48,281 DEBUG [c.c.a.t.Request]
> (Job-Executor-1:ctx-8e0ebced ctx-f49e7503) Seq 38-1788936343: Sending  {
> Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), Ver: v1, Flags:
> 100111,
> [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":805,"volumePath":"5f990946-d6b5-451e-8e78-2eefc1462253","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
> }
> 2015-10-13 13:21:48,281 DEBUG [c.c.a.t.Request]
> (Job-Executor-1:ctx-8e0ebced ctx-f49e7503) Seq 38-1788936343: Executing:  {
> Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), Ver: v1, Flags:
> 100111,
> [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":805,"volumePath":"5f990946-d6b5-451e-8e78-2eefc1462253","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
> }
> 2015-10-13 13:21:48,282 DEBUG [c.c.a.m.DirectAgentAttache]
> (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: Executing request
> 2015-10-13 15:27:13,396 DEBUG [c.c.a.m.DirectAgentAttache]
> (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: Response Received:
> 2015-10-13 15:27:13,397 DEBUG [c.c.a.t.Request]
> (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: Processing:  { Ans: ,
> MgmtId: 42756806312036, via: 38, Ver: v1, Flags: 110,
> [{"com.cloud.agent.api.storage.MigrateVolumeAnswer":{"volumePath":"00db15be-3ccd-4648-8928-35ca90924d7c","result":true,"wait":0}}]
> }
> 2015-10-13 15:27:13,397 DEBUG [c.c.a.m.AgentAttache]
> (DirectAgent-430:ctx-ac6d7aeb) Seq 38-1788936343: No more commands found
> 2015-10-13 15:27:13,397 DEBUG [c.c.a.t.Request]
> (Job-Executor-1:ctx-8e0ebced ctx-f49e7503) Seq 38-1788936343: Received:  {
> Ans: , MgmtId: 42756806312036, via: 38, Ver: v1, Flags: 110, {
> MigrateVolumeAnswer } }
>
>
> ________________________________________
> From: Rafael Weingärtner [rafaelweingartner@gmail.com]
> Sent: Tuesday, October 13, 2015 8:09 AM
> To: users@cloudstack.apache.org
> Subject: [Questionable]  Re: Timeout with live migration
>
> Let’s wait to see if there is nothing else messing with that timeout. Then
> I send you the details to put into the Jira ticket.
>
> On Tue, Oct 13, 2015 at 10:06 AM, Ryan Farrington <
> rfarrington@remitdata.com
> > wrote:
>
> > Rafael,
> >     I am still a bit confused as to what you would like for me to place
> in
> > the JIRA ticket.  I have adjusted the "wait" parameter and will be able
> to
> > test it in about an hour.  But i would think the JIRA ticket should be as
> > detailed as I can make it or will you be adding details once I have it
> > created?
> >
> >
> >
> >
> > ________________________________________
> > From: Rafael Weingärtner [rafaelweingartner@gmail.com]
> > Sent: Tuesday, October 13, 2015 7:52 AM
> > To: users@cloudstack.apache.org
> > Subject: [Questionable]  Re: [Questionable] Re: Timeout with live
> migration
> >
> > I guess so, for some reason that I do not understand, the code is
> > multiplying the value from that parameter by 2, something like 18000
> should
> > do the tricky
> >
> > On Tue, Oct 13, 2015 at 12:15 AM, Ryan Farrington <
> > rfarrington@remitdata.com
> > > wrote:
> >
> > > Yes i can open JIRA tickets. What would you like for me to do?
> > >
> > > I'll be happy to change the "wait" parameter.  Do I assume it should be
> > > 1/2 of the value i want it to be?
> > >
> > >
> > >
> > > ________________________________________
> > > From: Rafael Weingärtner [rafaelweingartner@gmail.com]
> > > Sent: Monday, October 12, 2015 10:12 PM
> > > To: users@cloudstack.apache.org
> > > Subject: [Questionable]  Re: Timeout with live migration
> > >
> > > There is your problem, there are currently two distinct values
> conrolling
> > > those async jobs.
> > > Change that value and everything will work for u.
> > > Can you open a jira ticket?
> > >
> > > On Mon, Oct 12, 2015 at 11:51 PM, Ryan Farrington <
> > > rfarrington@remitdata.com
> > > > wrote:
> > >
> > > > wait is currently configured to be 3600
> > > >
> > > >
> > > >
> > > > ________________________________________
> > > > From: Rafael Weingärtner [rafaelweingartner@gmail.com]
> > > > Sent: Monday, October 12, 2015 9:46 PM
> > > > To: users@cloudstack.apache.org
> > > > Subject: [Questionable]  Re: Timeout with live migration
> > > >
> > > > I found something odd,
> > > > can you check the parameter called "wait", what value is it using ?
> > > >
> > > > On Mon, Oct 12, 2015 at 10:54 PM, Ryan Farrington <
> > > > rfarrington@remitdata.com
> > > > > wrote:
> > > >
> > > > > Yes the parameter was set long ago and the management server has
> been
> > > > > restarted numerous time over the past few days as we played with
> > other
> > > > > parameters to no effect.
> > > > >
> > > > > After looking at the log a little more does the "Failed to send
> > > command,
> > > > > due to Agent:38, com.cloud.exception.OperationTimedoutException:
> > > Commands
> > > > > 996939857 to Host 38 timed out after 7200" mean that the migration
> > > start
> > > > > command is being sent in some kind of synchronous mode and not
> > > returning
> > > > > control back to the job manager?
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > ________________________________________
> > > > > From: Rafael Weingärtner [rafaelweingartner@gmail.com]
> > > > > Sent: Monday, October 12, 2015 8:46 PM
> > > > > To: users@cloudstack.apache.org
> > > > > Subject: [Questionable]  Re: Timeout with live migration
> > > > >
> > > > > I thought you using the command  “migrateVirtualMachineWithVolume”
> > but
> > > it
> > > > > seems that you are using “migrateVolume” command from ACS's API.
> > > > >
> > > > >
> > > > > For the code I debugged “migrateVirtualMachineWithVolume”, the
> > > parameter
> > > > > 3600, means 1 hour of timeout.
> > > > >
> > > > > For the “migrateVolume” is the same, they both end up in
> > > > >
> > > > >
> > > >
> > >
> >
> “com.cloud.hypervisor.xen.resource.XenServer610Resource.execute(MigrateVolumeCommand)”,
> > > > > and in that method the parameter is the same.
> > > > >
> > > > >
> > > > > If your parameter is set to 36000 (10 hours) I do not see why you
> are
> > > > > getting the exception after 2 hours.
> > > > >
> > > > > Did you restart the management servers after you changed the
> > parameter?
> > > > >
> > > > > On Mon, Oct 12, 2015 at 10:31 PM, Ryan Farrington <
> > > > > rfarrington@remitdata.com
> > > > > > wrote:
> > > > >
> > > > > > Here is the full log, including the stack for the exception,
that
> > we
> > > > get
> > > > > > at the 2 hour mark. as for the migratewait it is set to 36000
> which
> > > > > should
> > > > > > be 10 hours.
> > > > > >
> > > > > > 2015-10-12 18:41:20,137 DEBUG [c.c.a.m.DirectAgentAttache]
> > > > > > (DirectAgent-323:ctx-6d42edd7) Seq 31-1023875267: Executing
> request
> > > > > > 2015-10-12 18:41:20,457 DEBUG [c.c.a.m.AgentAttache]
> > > > > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857:
> > Waiting
> > > > > some
> > > > > > more time because this is the current command
> > > > > > 2015-10-12 18:41:20,457 INFO  [c.c.u.e.CSExceptionErrorCode]
> > > > > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Could not find
> > exception:
> > > > > > com.cloud.exception.OperationTimedoutException in error code
list
> > for
> > > > > > exceptions
> > > > > > 2015-10-12 18:41:20,465 WARN  [c.c.a.m.AgentAttache]
> > > > > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857:
> Timed
> > > out
> > > > > on
> > > > > > Seq 38-996939857:  { Cmd , MgmtId: 42756806312036, via:
> > > > 38(xen-nc-bc2b7),
> > > > > > Ver: v1, Flags: 100111,
> > > > > >
> > > > >
> > > >
> > >
> >
> [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
> > > > > > }
> > > > > > 2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache]
> > > > > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857:
> > > > Cancelling.
> > > > > > 2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache]
> > > > > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Seq 38-996939857:
No
> > more
> > > > > > commands found
> > > > > > 2015-10-12 18:41:20,465 DEBUG [o.a.c.s.RemoteHostEndPoint]
> > > > > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Failed to send
> command,
> > > due
> > > > > to
> > > > > > Agent:38, com.cloud.exception.OperationTimedoutException:
> Commands
> > > > > > 996939857 to Host 38 timed out after 7200
> > > > > > 2015-10-12 18:41:20,471 DEBUG
> [o.a.c.s.m.AncientDataMotionStrategy]
> > > > > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) copy failed
> > > > > > com.cloud.utils.exception.CloudRuntimeException: Failed to send
> > > > command,
> > > > > > due to Agent:38, com.cloud.exception.OperationTimedoutException:
> > > > Commands
> > > > > > 996939857 to Host 38 timed out after 7200
> > > > > >         at
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(RemoteHostEndPoint.java:116)
> > > > > >         at
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cloudstack.storage.motion.AncientDataMotionStrategy.migrateVolumeToPool(AncientDataMotionStrategy.java:382)
> > > > > >         at
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cloudstack.storage.motion.AncientDataMotionStrategy.copyAsync(AncientDataMotionStrategy.java:421)
> > > > > >         at
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cloudstack.storage.motion.DataMotionServiceImpl.copyAsync(DataMotionServiceImpl.java:70)
> > > > > >         at
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cloudstack.storage.volume.VolumeServiceImpl.migrateVolume(VolumeServiceImpl.java:931)
> > > > > >         at
> > > > > >
> > > > >
> > > >
> > >
> >
> com.cloud.storage.VolumeApiServiceImpl.liveMigrateVolume(VolumeApiServiceImpl.java:1680)
> > > > > >         at
> > > > > >
> > > > >
> > > >
> > >
> >
> com.cloud.storage.VolumeApiServiceImpl.orchestrateMigrateVolume(VolumeApiServiceImpl.java:1666)
> > > > > >         at
> > > > > >
> > > > >
> > > >
> > >
> >
> com.cloud.storage.VolumeApiServiceImpl.migrateVolume(VolumeApiServiceImpl.java:1622)
> > > > > >         at sun.reflect.GeneratedMethodAccessor335.invoke(Unknown
> > > > Source)
> > > > > >         at
> > > > > >
> > > > >
> > > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > > > >         at java.lang.reflect.Method.invoke(Method.java:622)
> > > > > >         at
> > > > > >
> > > > >
> > > >
> > >
> >
> org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
> > > > > >         at
> > > > > >
> > > > >
> > > >
> > >
> >
> org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
> > > > > >         at
> > > > > >
> > > > >
> > > >
> > >
> >
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
> > > > > >         at
> > > > > >
> > > > >
> > > >
> > >
> >
> org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
> > > > > >         at
> > > > > >
> > > > >
> > > >
> > >
> >
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
> > > > > >         at
> > > > > >
> > > > >
> > > >
> > >
> >
> org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
> > > > > >         at com.sun.proxy.$Proxy196.migrateVolume(Unknown Source)
> > > > > >         at
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cloudstack.api.command.user.volume.MigrateVolumeCmd.execute(MigrateVolumeCmd.java:103)
> > > > > >         at
> > > com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:161)
> > > > > >         at
> > > > > >
> > > > >
> > > >
> > >
> >
> com.cloud.api.ApiAsyncJobDispatcher.runJobInContext(ApiAsyncJobDispatcher.java:109)
> > > > > >         at
> > > > > >
> > > >
> > com.cloud.api.ApiAsyncJobDispatcher$1.run(ApiAsyncJobDispatcher.java:66)
> > > > > >         at
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
> > > > > >         at
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
> > > > > >         at
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
> > > > > >         at
> > > > > >
> > > >
> > com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:63)
> > > > > >         at
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:509)
> > > > > >         at
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
> > > > > >         at
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
> > > > > >         at
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
> > > > > >         at
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
> > > > > >         at
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
> > > > > >         at
> > > > > >
> > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > > > > >         at
> > > > > >
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > > > > >         at
> java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > > > > >         at
> > > > > >
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> > > > > >         at
> > > > > >
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > > > > >         at java.lang.Thread.run(Thread.java:701)
> > > > > > 2015-10-12 18:41:20,479 WARN
> > > [o.a.c.s.d.ObjectInDataStoreManagerImpl]
> > > > > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) Unsupported data
> object
> > > > > > (VOLUME,
> > > > > >
> > org.apache.cloudstack.storage.datastore.PrimaryDataStoreImpl@4fa7a45f
> > > > ),
> > > > > > no need to delete from object in store ref table
> > > > > > 2015-10-12 18:41:20,479 DEBUG [c.c.s.VolumeApiServiceImpl]
> > > > > > (Job-Executor-63:ctx-f7b6817d ctx-c6b92515) migrate volume
> > > > > > failed:com.cloud.utils.exception.CloudRuntimeException: Failed
to
> > > send
> > > > > > command, due to Agent:38,
> > > > com.cloud.exception.OperationTimedoutException:
> > > > > > Commands 996939857 to Host 38 timed out after 7200
> > > > > > 2015-10-12 18:41:20,480 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
> > > > > > (Job-Executor-63:ctx-f7b6817d) Complete async job-5257,
> jobStatus:
> > > > > FAILED,
> > > > > > resultCode: 530, result:
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":530,"errortext":"Failed
> > > > > > to migrate volume"}
> > > > > > 2015-10-12 18:41:20,486 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
> > > > > > (Job-Executor-63:ctx-f7b6817d) Done executing
> > > > > > org.apache.cloudstack.api.command.user.volume.MigrateVolumeCmd
> for
> > > > > job-5257
> > > > > > 2015-10-12 18:41:20,489 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
> > > > > > (Job-Executor-63:ctx-f7b6817d) Remove job-5257 from job
> monitoring
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > ________________________________________
> > > > > > From: Rafael Weingärtner [rafaelweingartner@gmail.com]
> > > > > > Sent: Monday, October 12, 2015 8:24 PM
> > > > > > To: users@cloudstack.apache.org
> > > > > > Subject: [Questionable]  Re: [Questionable] Re: Timeout with
live
> > > > > migration
> > > > > >
> > > > > > Now I understand what you are doing, I am familiar with that
> > concept
> > > > > (live
> > > > > > migration of VM within a cluster, having the VHD being moved
from
> > one
> > > > SR
> > > > > to
> > > > > > another).
> > > > > >
> > > > > > I just got confused when I read live migration of volumes (a
> volume
> > > > does
> > > > > > not run by itself, so that why I asked a little for some more
> > > > > information).
> > > > > >
> > > > > > Looking at the source code this is the variable used to control
> the
> > > > > > timeout:
> > > > > > "long timeout = (_migratewait) * 1000L;"
> > > > > >
> > > > > > The value of "_migratewait" is taken from this parameter:
> > > > > > value = (String) params.get("migratewait");
> > > > > > _migratewait = NumbersUtil.parseInt(value, 3600);
> > > > > >
> > > > > > Therefore, the name of the parameter to be configured is
> > > "migratewait",
> > > > > the
> > > > > > default value is 3600.
> > > > > >
> > > > > >
> > > > > > BTW1: I think that is a terrible parameter name. We should
> refactor
> > > > that,
> > > > > > could you open a Jira ticket for that?
> > > > > >
> > > > > > BTW2: that error message you posted does not seem to be related
> to
> > > the
> > > > > > migration timeout; hence, in the code if the copy times out
the
> > > message
> > > > > > would be:
> > > > > > "Async " + timeout/1000 + " seconds timeout for task " +
> > > > task.toString()"
> > > > > >
> > > > > > Maybe because it throws a "Types.BadAsyncResult(msg)" and that
> > might
> > > be
> > > > > > translated into that message, or that might not be related to
the
> > > > problem
> > > > > > itself, and you just thought that it was.
> > > > > >
> > > > > >
> > > > > > Does it help you?
> > > > > >
> > > > > >
> > > > > > On Mon, Oct 12, 2015 at 10:00 PM, Ryan Farrington <
> > > > > > rfarrington@remitdata.com
> > > > > > > wrote:
> > > > > >
> > > > > > > Hypervisor:  XenServer
> > > > > > >
> > > > > > > We are moving a data volume from one storage onto another
> without
> > > > > > shutting
> > > > > > > down the VM cause that would just be silly and a triplication
> of
> > > > effort
> > > > > > > with the whole copying to secondary storage and then back
off
> > > again.
> > > > > The
> > > > > > > volume is staying in the same cluster just moving to a
> different
> > > > > Primary
> > > > > > > storage (or SR in the XenServer vernacular)
> > > > > > >
> > > > > > > If you are familiar with ESX this is a "Storage VMotion"
where
> as
> > > in
> > > > > > > XenServer it is called "Storage XenMotion".
> > > > > > >
> > > > > > > ________________________________________
> > > > > > > From: Rafael Weingärtner [rafaelweingartner@gmail.com]
> > > > > > > Sent: Monday, October 12, 2015 7:53 PM
> > > > > > > To: users@cloudstack.apache.org
> > > > > > > Subject: [Questionable]  Re: Timeout with live migration
> > > > > > >
> > > > > > > what do you mean with livre migrating data volume ?!
> > > > > > > I understand a live migration of a VM, but volumes...
> > > > > > >
> > > > > > > do you mean live migrating a VM that has a volume attached?
> > > > > > > are you migrating that volume to a different cluster? or
just a
> > > > > different
> > > > > > > storage in the same cluster?
> > > > > > > What hypervisor are you using ?
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Oct 12, 2015 at 9:47 PM, Ryan Farrington <
> > > > > > > rfarrington@remitdata.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Live migrating a data volume. We are purely on shared
storage
> > so
> > > no
> > > > > > local
> > > > > > > > storage is involved.
> > > > > > > >
> > > > > > > > ________________________________________
> > > > > > > > From: Rafael Weingärtner [rafaelweingartner@gmail.com]
> > > > > > > > Sent: Monday, October 12, 2015 7:37 PM
> > > > > > > > To: users@cloudstack.apache.org
> > > > > > > > Subject: [Questionable]  Re: Timeout with live migration
> > > > > > > >
> > > > > > > > Are you live migrating a VM, or migrating a volume
of a
> stopped
> > > VM
> > > > > to a
> > > > > > > > different primary storage?
> > > > > > > >
> > > > > > > > If it is a running VM, is the VM allocated in a shared
> storage
> > or
> > > > > local
> > > > > > > > storage?
> > > > > > > >
> > > > > > > > On Mon, Oct 12, 2015 at 9:17 PM, Ryan Farrington <
> > > > > > > > rfarrington@remitdata.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > The slow transfer is related to the storage we
are trying
> to
> > > > > migrate
> > > > > > > off
> > > > > > > > > of.  We are capable of getting about 350mbps
off the disks
> > but
> > > > when
> > > > > > we
> > > > > > > > are
> > > > > > > > > moving volumes that are greater than about 500GB
we end up
> > > racing
> > > > > the
> > > > > > > > clock
> > > > > > > > > and hoping that the migration finishes before
the job times
> > > out.
> > > > >  It
> > > > > > > > would
> > > > > > > > > be awesome to be able to manage that timeout
and I know
> there
> > > > are a
> > > > > > ton
> > > > > > > > of
> > > > > > > > > settings I just don't know about and am hoping
someone
> might
> > be
> > > > > able
> > > > > > to
> > > > > > > > > point me in the right direction.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > ________________________________________
> > > > > > > > > From: Rafael Weingärtner [rafaelweingartner@gmail.com]
> > > > > > > > > Sent: Monday, October 12, 2015 6:40 PM
> > > > > > > > > To: users@cloudstack.apache.org
> > > > > > > > > Subject: [Questionable]  Re: Timeout with live
migration
> > > > > > > > >
> > > > > > > > > I would first check your NICs' speed and load,
the amount
> of
> > > RAM
> > > > > > > > allocated
> > > > > > > > > for the migrating VM and than check the hypervisor
log
> files.
> > > > > > > > >
> > > > > > > > > On Mon, Oct 12, 2015 at 8:19 PM, Jan-Arve Nygård
<
> > > > > > > > > jan.arve.nygard@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > What version are you running? Check if the
> copy.volume.wait
> > > > > setting
> > > > > > > is
> > > > > > > > > set
> > > > > > > > > > to 7200 and increase it. If not you could
also check
> > > > > > > > > > job.cancel.threshold.minutes and job.expire.minutes.
> > > > > > > > > >
> > > > > > > > > > -Jan-Arve
> > > > > > > > > >
> > > > > > > > > > 2015-10-13 0:46 GMT+02:00 Ryan Farrington
<
> > > > > > rfarrington@remitdata.com
> > > > > > > >:
> > > > > > > > > >
> > > > > > > > > > > We are experiencing a failure in cloudstack
waiting for
> > an
> > > > > async
> > > > > > > job
> > > > > > > > > > > performing a live migration of a volume
to finish. I've
> > > > copied
> > > > > > the
> > > > > > > > > > relevant
> > > > > > > > > > > log entries below.We acknowledge that
the migration
> will
> > > > take a
> > > > > > few
> > > > > > > > > hours
> > > > > > > > > > > based on the volume of the data and
we are looking for
> a
> > > way
> > > > to
> > > > > > > > > increase
> > > > > > > > > > > the timeout of 7200 seconds into something
we know we
> can
> > > > work
> > > > > > > with.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > 2015-10-12 00:19:36,043 DEBUG
> > [o.a.c.s.RemoteHostEndPoint]
> > > > > > > > > > > (Job-Executor-62:ctx-802065a9 ctx-bb27a168)
Failed to
> > send
> > > > > > command,
> > > > > > > > due
> > > > > > > > > > to
> > > > > > > > > > > Agent:27,
> com.cloud.exception.OperationTimedoutException:
> > > > > > Commands
> > > > > > > > > > > 835325398 to Host 27 timed out after
7200
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Rafael Weingärtner
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Rafael Weingärtner
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Rafael Weingärtner
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Rafael Weingärtner
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Rafael Weingärtner
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Rafael Weingärtner
> > > >
> > >
> > >
> > >
> > > --
> > > Rafael Weingärtner
> > >
> >
> >
> >
> > --
> > Rafael Weingärtner
> >
>
>
>
> --
> Rafael Weingärtner
>



--
Rafael Weingärtner

Mime
View raw message