cloudstack-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan Farrington (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CLOUDSTACK-8949) Job timeout caused by improper/inconsistent application of wait parameters
Date Thu, 15 Oct 2015 20:40:05 GMT

     [ https://issues.apache.org/jira/browse/CLOUDSTACK-8949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ryan Farrington updated CLOUDSTACK-8949:
----------------------------------------
    Description: 
when creating a "MigrateVolumeCommand" a timeout should be set, this way the ACS would not
use the default value of parameter "wait".  That timeout value should be the same as the one
used on CitrixResourceBase and its children to control the migration of volumes.

Environment:    XenServer 6.2, multiple pre-setup primary storage locations that are iscsi
SRs in XenServer.

Summary of actions:      Attempt to move a volume that is attached to a running instance to
another primary storage location.


What *I* assume should happen:    Cloudstack communicates with the host of the VM in question
and issues a xe vdi-pool-migrate.   The command is synchronous and doesn't return control
back until it has performed the action.   Cloudstack should know this is a synchronous call
and wait the configured amount of time for the migration to be performed.


What *I* think is happening(with some education from Rafael Weingärtner):    There are multiple
timers being set in the process (please reference  




{noformat}
2015-10-12 16:41:20,456 DEBUG [c.c.a.t.Request] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515)
Seq 38-996939857: Sending  { Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), Ver: v1,
Flags: 100111, [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
}
2015-10-12 16:41:20,456 DEBUG [c.c.a.t.Request] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515)
Seq 38-996939857: Executing:  { Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), Ver:
v1, Flags: 100111, [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
}
2015-10-12 16:41:20,457 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-494:ctx-5aac3f29)
Seq 38-996939857: Executing request
2015-10-12 17:41:20,457 DEBUG [c.c.a.m.AgentAttache] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515)
Seq 38-996939857: Waiting some more time because this is the current command
2015-10-12 18:41:20,457 DEBUG [c.c.a.m.AgentAttache] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515)
Seq 38-996939857: Waiting some more time because this is the current command
2015-10-12 18:41:20,457 INFO  [c.c.u.e.CSExceptionErrorCode] (Job-Executor-63:ctx-f7b6817d
ctx-c6b92515) Could not find exception: com.cloud.exception.OperationTimedoutException in
error code list for exceptions
2015-10-12 18:41:20,465 WARN  [c.c.a.m.AgentAttache] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515)
Seq 38-996939857: Timed out on Seq 38-996939857:  { Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7),
Ver: v1, Flags: 100111, [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
}
2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515)
Seq 38-996939857: Cancelling.
2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515)
Seq 38-996939857: No more commands found
2015-10-12 18:41:20,465 DEBUG [o.a.c.s.RemoteHostEndPoint] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515)
Failed to send command, due to Agent:38, com.cloud.exception.OperationTimedoutException: Commands
996939857 to Host 38 timed out after 7200
2015-10-12 18:41:20,471 DEBUG [o.a.c.s.m.AncientDataMotionStrategy] (Job-Executor-63:ctx-f7b6817d
ctx-c6b92515) copy failed
com.cloud.utils.exception.CloudRuntimeException: Failed to send command, due to Agent:38,
com.cloud.exception.OperationTimedoutException: Commands 996939857 to Host 38 timed out after
7200
        at org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(RemoteHostEndPoint.java:116)
        at org.apache.cloudstack.storage.motion.AncientDataMotionStrategy.migrateVolumeToPool(AncientDataMotionStrategy.java:382)
        at org.apache.cloudstack.storage.motion.AncientDataMotionStrategy.copyAsync(AncientDataMotionStrategy.java:421)
        at org.apache.cloudstack.storage.motion.DataMotionServiceImpl.copyAsync(DataMotionServiceImpl.java:70)
        at org.apache.cloudstack.storage.volume.VolumeServiceImpl.migrateVolume(VolumeServiceImpl.java:931)
        at com.cloud.storage.VolumeApiServiceImpl.liveMigrateVolume(VolumeApiServiceImpl.java:1680)
        at com.cloud.storage.VolumeApiServiceImpl.orchestrateMigrateVolume(VolumeApiServiceImpl.java:1666)
        at com.cloud.storage.VolumeApiServiceImpl.migrateVolume(VolumeApiServiceImpl.java:1622)
        at sun.reflect.GeneratedMethodAccessor335.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:622)
        at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
        at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
        at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
        at com.sun.proxy.$Proxy196.migrateVolume(Unknown Source)
        at org.apache.cloudstack.api.command.user.volume.MigrateVolumeCmd.execute(MigrateVolumeCmd.java:103)
        at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:161)
        at com.cloud.api.ApiAsyncJobDispatcher.runJobInContext(ApiAsyncJobDispatcher.java:109)
        at com.cloud.api.ApiAsyncJobDispatcher$1.run(ApiAsyncJobDispatcher.java:66)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
        at com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:63)
        at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:509)
        at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
        at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:701)
2015-10-12 18:41:20,479 WARN  [o.a.c.s.d.ObjectInDataStoreManagerImpl] (Job-Executor-63:ctx-f7b6817d
ctx-c6b92515) Unsupported data object (VOLUME, org.apache.cloudstack.storage.datastore.PrimaryDataStoreImpl@4fa7a45f),
no need to delete from object in store ref table
2015-10-12 18:41:20,479 DEBUG [c.c.s.VolumeApiServiceImpl] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515)
migrate volume failed:com.cloud.utils.exception.CloudRuntimeException: Failed to send command,
due to Agent:38, com.cloud.exception.OperationTimedoutException: Commands 996939857 to Host
38 timed out after 7200
2015-10-12 18:41:20,480 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Job-Executor-63:ctx-f7b6817d)
Complete async job-5257, jobStatus: FAILED, resultCode: 530, result: org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":530,"errortext":"Failed
to migrate volume"}
2015-10-12 18:41:20,486 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Job-Executor-63:ctx-f7b6817d)
Done executing org.apache.cloudstack.api.command.user.volume.MigrateVolumeCmd for job-5257
2015-10-12 18:41:20,489 INFO  [o.a.c.f.j.i.AsyncJobMonitor] (Job-Executor-63:ctx-f7b6817d)
Remove job-5257 from job monitoring
{noformat}


  was:
when creating a "MigrateVolumeCommand" a timeout should be set, this way the ACS would not
use the default value of parameter "wait".  That timeout value should be the same as the one
used on CitrixResourceBase and its children to control the migration of volumes.



> Job timeout caused by improper/inconsistent application of wait parameters
> --------------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-8949
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-8949
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>          Components: Hypervisor Controller
>    Affects Versions: 4.3.0
>         Environment: XenServer 6.2
>            Reporter: Ryan Farrington
>            Priority: Minor
>
> when creating a "MigrateVolumeCommand" a timeout should be set, this way the ACS would
not use the default value of parameter "wait".  That timeout value should be the same as the
one used on CitrixResourceBase and its children to control the migration of volumes.
> Environment:    XenServer 6.2, multiple pre-setup primary storage locations that are
iscsi SRs in XenServer.
> Summary of actions:      Attempt to move a volume that is attached to a running instance
to another primary storage location.
> What *I* assume should happen:    Cloudstack communicates with the host of the VM in
question and issues a xe vdi-pool-migrate.   The command is synchronous and doesn't return
control back until it has performed the action.   Cloudstack should know this is a synchronous
call and wait the configured amount of time for the migration to be performed.
> What *I* think is happening(with some education from Rafael Weingärtner):    There are
multiple timers being set in the process (please reference  
> {noformat}
> 2015-10-12 16:41:20,456 DEBUG [c.c.a.t.Request] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515)
Seq 38-996939857: Sending  { Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), Ver: v1,
Flags: 100111, [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
}
> 2015-10-12 16:41:20,456 DEBUG [c.c.a.t.Request] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515)
Seq 38-996939857: Executing:  { Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7), Ver:
v1, Flags: 100111, [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
}
> 2015-10-12 16:41:20,457 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-494:ctx-5aac3f29)
Seq 38-996939857: Executing request
> 2015-10-12 17:41:20,457 DEBUG [c.c.a.m.AgentAttache] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515)
Seq 38-996939857: Waiting some more time because this is the current command
> 2015-10-12 18:41:20,457 DEBUG [c.c.a.m.AgentAttache] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515)
Seq 38-996939857: Waiting some more time because this is the current command
> 2015-10-12 18:41:20,457 INFO  [c.c.u.e.CSExceptionErrorCode] (Job-Executor-63:ctx-f7b6817d
ctx-c6b92515) Could not find exception: com.cloud.exception.OperationTimedoutException in
error code list for exceptions
> 2015-10-12 18:41:20,465 WARN  [c.c.a.m.AgentAttache] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515)
Seq 38-996939857: Timed out on Seq 38-996939857:  { Cmd , MgmtId: 42756806312036, via: 38(xen-nc-bc2b7),
Ver: v1, Flags: 100111, [{"com.cloud.agent.api.storage.MigrateVolumeCommand":{"volumeId":808,"volumePath":"0cd3ec8c-9fa9-4caf-8380-1a85cdfd0958","pool":{"id":246,"uuid":"VNX_PR5_LUN2003","host":"localhost","path":"/VNX_PR5_LUN2003","port":0,"type":"PreSetup"},"attachedVmName":"i-34-311-VM","wait":0}}]
}
> 2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515)
Seq 38-996939857: Cancelling.
> 2015-10-12 18:41:20,465 DEBUG [c.c.a.m.AgentAttache] (Job-Executor-63:ctx-f7b6817d ctx-c6b92515)
Seq 38-996939857: No more commands found
> 2015-10-12 18:41:20,465 DEBUG [o.a.c.s.RemoteHostEndPoint] (Job-Executor-63:ctx-f7b6817d
ctx-c6b92515) Failed to send command, due to Agent:38, com.cloud.exception.OperationTimedoutException:
Commands 996939857 to Host 38 timed out after 7200
> 2015-10-12 18:41:20,471 DEBUG [o.a.c.s.m.AncientDataMotionStrategy] (Job-Executor-63:ctx-f7b6817d
ctx-c6b92515) copy failed
> com.cloud.utils.exception.CloudRuntimeException: Failed to send command, due to Agent:38,
com.cloud.exception.OperationTimedoutException: Commands 996939857 to Host 38 timed out after
7200
>         at org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(RemoteHostEndPoint.java:116)
>         at org.apache.cloudstack.storage.motion.AncientDataMotionStrategy.migrateVolumeToPool(AncientDataMotionStrategy.java:382)
>         at org.apache.cloudstack.storage.motion.AncientDataMotionStrategy.copyAsync(AncientDataMotionStrategy.java:421)
>         at org.apache.cloudstack.storage.motion.DataMotionServiceImpl.copyAsync(DataMotionServiceImpl.java:70)
>         at org.apache.cloudstack.storage.volume.VolumeServiceImpl.migrateVolume(VolumeServiceImpl.java:931)
>         at com.cloud.storage.VolumeApiServiceImpl.liveMigrateVolume(VolumeApiServiceImpl.java:1680)
>         at com.cloud.storage.VolumeApiServiceImpl.orchestrateMigrateVolume(VolumeApiServiceImpl.java:1666)
>         at com.cloud.storage.VolumeApiServiceImpl.migrateVolume(VolumeApiServiceImpl.java:1622)
>         at sun.reflect.GeneratedMethodAccessor335.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:622)
>         at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
>         at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
>         at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
>         at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
>         at com.sun.proxy.$Proxy196.migrateVolume(Unknown Source)
>         at org.apache.cloudstack.api.command.user.volume.MigrateVolumeCmd.execute(MigrateVolumeCmd.java:103)
>         at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:161)
>         at com.cloud.api.ApiAsyncJobDispatcher.runJobInContext(ApiAsyncJobDispatcher.java:109)
>         at com.cloud.api.ApiAsyncJobDispatcher$1.run(ApiAsyncJobDispatcher.java:66)
>         at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>         at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
>         at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
>         at com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:63)
>         at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:509)
>         at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
>         at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>         at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
>         at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
>         at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:701)
> 2015-10-12 18:41:20,479 WARN  [o.a.c.s.d.ObjectInDataStoreManagerImpl] (Job-Executor-63:ctx-f7b6817d
ctx-c6b92515) Unsupported data object (VOLUME, org.apache.cloudstack.storage.datastore.PrimaryDataStoreImpl@4fa7a45f),
no need to delete from object in store ref table
> 2015-10-12 18:41:20,479 DEBUG [c.c.s.VolumeApiServiceImpl] (Job-Executor-63:ctx-f7b6817d
ctx-c6b92515) migrate volume failed:com.cloud.utils.exception.CloudRuntimeException: Failed
to send command, due to Agent:38, com.cloud.exception.OperationTimedoutException: Commands
996939857 to Host 38 timed out after 7200
> 2015-10-12 18:41:20,480 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Job-Executor-63:ctx-f7b6817d)
Complete async job-5257, jobStatus: FAILED, resultCode: 530, result: org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":530,"errortext":"Failed
to migrate volume"}
> 2015-10-12 18:41:20,486 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Job-Executor-63:ctx-f7b6817d)
Done executing org.apache.cloudstack.api.command.user.volume.MigrateVolumeCmd for job-5257
> 2015-10-12 18:41:20,489 INFO  [o.a.c.f.j.i.AsyncJobMonitor] (Job-Executor-63:ctx-f7b6817d)
Remove job-5257 from job monitoring
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message