incubator-cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alena Prokharchyk <Alena.Prokharc...@citrix.com>
Subject Re: FS on cloudStack createSnapshot synchronization improvement
Date Mon, 15 Oct 2012 03:53:32 GMT
Anthony,  I implemented the threshold logic on Api Layer, in SyncQueueJob manager. In other
words, before submitting the job for execution, we should know the host the job would go first
to – that would be the object we are synchronizing on.
For createSnapshot it's always the host where vm is 1) running on (for Running vm) 2) ran
the last time on (for Stopped vm). Only when the command fails on the initial host, we retry
on other hosts in cluster. So it would work like this:

1) api call is made
2) Before submitting the async job to the queue, we figure out the host id (getHostIdForSnapshotOperation
method in SnapshotManagerImpl). Lets say, the id of the host is 1.
3) The job is submitted with object to sync on = "host id=1".
4) Once the job is ready to execute, it goes to snapshot manager which sends the command to
the host id=1 first. If it fails by some reason, it gets resent to other host in the cluster
(if exist). And in this failure scenario we don't do any synchronization. We've decided not
to handle this error case because it won't happen in most of the cases.

I've checked the code for other commands you've mentioned; the host is always picked up randomly
from the list of hosts in cluster. So we can't apply the same logic unless we fix the code
to pick up the same host on step 2) and step 4) without making callbacks from SnapshotManager
to the SyncQueueManager.

I would appreciate any suggestions on how to implement it.

Thank you,
Alena.

From: Anthony Xu <Xuefei.Xu@citrix.com<mailto:Xuefei.Xu@citrix.com>>
Reply-To: "cloudstack-dev@incubator.apache.org<mailto:cloudstack-dev@incubator.apache.org>"
<cloudstack-dev@incubator.apache.org<mailto:cloudstack-dev@incubator.apache.org>>
To: "cloudstack-dev@incubator.apache.org<mailto:cloudstack-dev@incubator.apache.org>"
<cloudstack-dev@incubator.apache.org<mailto:cloudstack-dev@incubator.apache.org>>
Subject: RE: FS on cloudStack createSnapshot synchronization improvement

There are several commands need this kind of threshold, e.g. Move volume, create template
from snapshot,
So this is common requirement , not only for createsnapshot.
Can we add threshold mechanism in host command queue to resolve this issue?


Anthony

-----Original Message-----
From: Edison Su [mailto:Edison.su@citrix.com]
Sent: Thursday, October 11, 2012 4:42 PM
To: cloudstack-dev@incubator.apache.org<mailto:cloudstack-dev@incubator.apache.org>
Subject: RE: FS on cloudStack createSnapshot synchronization improvement

I only have one comment:
  Can we put this snapshot improvement code out of snapshotmanager?

-----Original Message-----
From: Alena Prokharchyk [mailto:Alena.Prokharchyk@citrix.com]
Sent: Tuesday, October 09, 2012 11:51 AM
To: cloudstack-dev@incubator.apache.org<mailto:cloudstack-dev@incubator.apache.org>
Subject: FS on cloudStack createSnapshot synchronization improvement
Hi All,
I'm planning to introduce some changes to create snapshot behavior for
the future cloudStack release (the changes will go to asf/master
branch).
The
fix is fixing the problem described below:
"With  the current code for snapshots, cloudStack always creates
snapshot on  the host where vm is Running (for vms in Running state)
or on the host  where vm used to run the last time (for vms in Stopped
state). As the createSnapshot commands are not synchronized on the
agent side, the case when multiple  commands are send to the backend
at the same time can lead to the  performance issues on the hypervisor
side.  At the end there is a high  possibility that createSnapshot
command might time out on the Xen side.
The  solution is to synchronize number of concurrent snapshots per
host basis. The threshold should be configurable as the customer
usually knows how many snapshots at a time the backend can handle.
While the  concurrent snapshots are being processed by the backend,
all subsequent  snapshot commands scheduled for execution on the same
host, should wait  in the queue"
Here is the feature FS available for the review:
https://cwiki.apache.org/confluence/display/CLOUDSTACK/Snapshot+improv
e
ment
s+FS
If you have any comments/suggestions/questions on the implementation,
please let me know.
-Alena.



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message