cloudstack-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ivan Kozlov (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CLOUDSTACK-4939) Failed to create snaphot (KVM, GFS2)
Date Fri, 25 Oct 2013 21:08:32 GMT

    [ https://issues.apache.org/jira/browse/CLOUDSTACK-4939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13804407#comment-13804407
] 

Ivan Kozlov edited comment on CLOUDSTACK-4939 at 10/25/13 9:06 PM:
-------------------------------------------------------------------

I have checked this with ocfs2 and with build from ACS 4.2.1. Snapshots still failing. 4.2.1
is affected. NFS seems to work fine. When one of two hosts is in maintainance snapshots are
working fine. It makes me think it's not KVM bug but ACS bug.

To recreate issue:
1. set up cloudstack with 2 or more hosts and sharedstorage (gfs2/ocfs2 over iscsi)
2. deploy 4-6 VMs and start simultaneous snapshot.

setting concurrent.snapshots.threshold.perhost = 1 doesn't solves the issue.

snapshot files are actually created on secondary storage. there is a record in mysql cloud.snapshots,
but there is no record in cloud.snapshot_store_ref. Manually updating snapshot status to "BackedUp"
and adding record with path to snapshot_store_ref makes snapshot available. I was able to
create template from this snapshot and deploy VM from the template.

ADD:
Continue researching. Finally enabled logging on agents (by the way that's another bug, need
to move /etc/cloudstack/agent/log4j-cloud.xml to /etc/cloudstack/agent/log4j.xml to make logging
work). 
I created 4 instances on 2 hosts. Performed twice backup of each VM. We are supposed to get
8 create snapshot commands, 8 backup snapshot and 8 remove snapshot. However 
on first host:
managesnapshot.sh -c was run 2 times
managesnapshot.sh -b was run 5 times
managesnapshot.sh -d was run 3 times
on the second host
managesnapshot.sh -c was run 2 times
managesnapshot.sh -b was run 3 times
managesnapshot.sh -d was run 2 times

so in total we get 4 create snapshot commands, 8 backup snapshot and 5 delete snapshot. it
seems to be very strange.

ADD2
I think I know what's going on. Management server send commands to create/backup/delete snapshots
rendomly to any host. Depending on VM is running or not on the host, host uses libvirt or
qemu-img.
For example host1 creates snapshot using qemu-img. Command snapshot backup is sent to host2,
where VM is running. host2 tries to backup snapshot using libvirt but there is no snapshot
visible for that domain (because it was created with qemu-img). so backup fails. the more
hosts are in cluster the more possible is snapshot failure.

So we need to 
1. Check if VM is running before snapshot.
2. If VM is running send all commands (create/backup/delete snapshot) only to the host where
it is running.

To my mind this should solve the issue. Maybe someone can do this?


was (Author: eldorado):
I have checked this with ocfs2 and with build from ACS 4.2.1. Snapshots still failing. 4.2.1
is affected. NFS seems to work fine. When one of two hosts is in maintainance snapshots are
working fine. It makes me think it's not KVM bug but ACS bug.

To recreate issue:
1. set up cloudstack with 2 or more hosts and sharedstorage (gfs2/ocfs2 over iscsi)
2. deploy 4-6 VMs and start simultaneous snapshot.

setting concurrent.snapshots.threshold.perhost = 1 doesn't solves the issue.

snapshot files are actually created on secondary storage. there is a record in mysql cloud.snapshots,
but there is no record in cloud.snapshot_store_ref. Manually updating snapshot status to "BackedUp"
and adding record with path to snapshot_store_ref makes snapshot available. I was able to
create template from this snapshot and deploy VM from the template.

ADD:
Continue researching. Finally enabled logging on agents (by the way that's another bug, need
to move /etc/cloudstack/agent/log4j-cloud.xml to /etc/cloudstack/agent/log4j.xml to make logging
work). 
I created 4 instances on 2 hosts. Performed twice backup of each VM. We are supposed to get
8 create snapshot commands, 8 backup snapshot and 8 remove snapshot. However 
on first host:
managesnapshot.sh -c was run 2 times
managesnapshot.sh -b was run 5 times
managesnapshot.sh -d was run 3 times
on the second host
managesnapshot.sh -c was run 2 times
managesnapshot.sh -b was run 3 times
managesnapshot.sh -d was run 2 times

so in total we get 4 create snapshot commands, 8 backup snapshot and 5 delete snapshot. it
seems to be very strange.

> Failed to create snaphot (KVM, GFS2)
> ------------------------------------
>
>                 Key: CLOUDSTACK-4939
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-4939
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>          Components: KVM, Snapshot
>    Affects Versions: 4.2.0, 4.2.1
>         Environment: CentOS 6.4, KVM, Shared mount point primary storage, GFS2, iSCSI
>            Reporter: Ivan Kozlov
>            Priority: Blocker
>              Labels: kvm, sharedstorage, snapshot
>             Fix For: 4.2.1
>
>
> With one host snapshots are created ok. After adding second host some snapshots fail
(Failed to create snapshot due to an internal error creating snapshot for volume 14) stucking
with state "CreatedOnPrimary". Even when all VMs are running on the same host.
> debug libvirt log shows:
> 2013-10-23 17:31:21.634+0000: 20007: debug : virStorageFileGetMetadataInternal:673 :
path=/mnt/48a148f6-3373-3af2-8667-2f240988163d/snapshots, fd=31, format=2
> 2013-10-23 17:32:57.189+0000: 20015: debug : qemuSnapObjFromName:233 : Domain snapshot
not found: no domain snapshot with matching name '909848a0-b3ec-4657-a53a-c449dc24365b'
> 2013-10-23 17:32:57.474+0000: 20009: debug : virStorageFileGetMetadataInternal:673 :
path=/mnt/48a148f6-3373-3af2-8667-2f240988163d/snapshots, fd=31, format=2
> 2013-10-23 17:34:28.264+0000: 20008: debug : qemuSnapObjFromName:233 : Domain snapshot
not found: no domain snapshot with matching name 'f4e51b11-ac79-4a6a-b887-8926ffbd5cca'
> management server log:
> 2013-10-23 20:29:50,561 INFO  [user.snapshot.CreateSnapshotCmd] (Job-Executor-52:job-94
= [ 42f8d6e0-762e-4f01-a7d5-daff2e31be13 ]) VOLSS: createSnapshotCmd starts:1382549390561
> 2013-10-23 20:29:52,053 DEBUG [agent.transport.Request] (Job-Executor-52:job-94 = [ 42f8d6e0-762e-4f01-a7d5-daff2e31be13
]) Seq 6-1170407437: Waiting for Seq 1170407434 Scheduling:  { Cmd , MgmtId: 161342718518,
via: 6, Ver: v1, Flags: 100111, [{"org.apache.cloudstack.storage.command.CopyCommand":{"srcTO":{"org.apache.cloudstack.storage.to.SnapshotObjectTO":{"path":"/primary/d59c6574-8ff9-41e4-86e5-ce560f30d717/f4e51b11-ac79-4a6a-b887-8926ffbd5cca","volume":{"uuid":"02c07659-59d3-42f2-8928-1d899cef94e7","volumeType":"ROOT","dataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"2c8e7b93-2d02-4c47-99ce-7bcd8670554a","id":2,"poolType":"SharedMountPoint","host":"localhost","path":"/primary","port":0}},"name":"ROOT-14","size":8589934592,"path":"d59c6574-8ff9-41e4-86e5-ce560f30d717","volumeId":14,"vmName":"i-2-14-VM","accountId":2,"format":"QCOW2","id":14,"hypervisorType":"KVM"},"parentSnapshotPath":"/primary/d59c6574-8ff9-41e4-86e5-ce560f30d717/ab317705-7368-4a40-9d1c-da2c8a7b1824","dataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"2c8e7b93-2d02-4c47-99ce-7bcd8670554a","id":2,"poolType":"SharedMountPoint","host":"localhost","path":"/primary","port":0}},"vmName":"i-2-14-VM","name":"t1_ROOT-14_20131023172950","hypervisorType":"KVM","id":33}},"destTO":{"org.apache.cloudstack.storage.to.SnapshotObjectTO":{"path":"snapshots/2/14","volume":{"uuid":"02c07659-59d3-42f2-8928-1d899cef94e7","volumeType":"ROOT","dataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"2c8e7b93-2d02-4c47-99ce-7bcd8670554a","id":2,"poolType":"SharedMountPoint","host":"localhost","path":"/primary","port":0}},"name":"ROOT-14","size":8589934592,"path":"d59c6574-8ff9-41e4-86e5-ce560f30d717","volumeId":14,"vmName":"i-2-14-VM","accountId":2,"format":"QCOW2","id":14,"hypervisorType":"KVM"},"parentSnapshotPath":"snapshots/2/14/ab317705-7368-4a40-9d1c-da2c8a7b1824","dataStore":{"com.cloud.agent.api.to.NfsTO":{"_url":"nfs://192.168.10.31/export/secondary","_role":"Image"}},"vmName":"i-2-14-VM","name":"t1_ROOT-14_20131023172950","hypervisorType":"KVM","id":33}},"executeInSequence":true,"wait":21600}}]
}
> 2013-10-23 20:31:21,560 DEBUG [agent.transport.Request] (AgentManager-Handler-8:null)
Seq 6-1170407434: Processing:  { Ans: , MgmtId: 161342718518, via: 6, Ver: v1, Flags: 110,
[{"org.apache.cloudstack.storage.command.CopyCmdAnswer":{"result":false,"details":"org.libvirt.LibvirtException:
Domain snapshot not found: no domain snapshot with matching name '65113136-dfb5-4cea-8e65-1065462ca2fe'","wait":0}}]
}
> 2013-10-23 20:31:21,832 DEBUG [storage.snapshot.SnapshotManagerImpl] (Job-Executor-49:job-91
= [ e2bf2454-4273-4a89-bc38-35add8297eb1 ]) Failed to create snapshot
> com.cloud.utils.exception.CloudRuntimeException: org.libvirt.LibvirtException: Domain
snapshot not found: no domain snapshot with matching name '65113136-dfb5-4cea-8e65-1065462ca2fe'
>         at org.apache.cloudstack.storage.snapshot.SnapshotServiceImpl.backupSnapshot(SnapshotServiceImpl.java:280)
>         at org.apache.cloudstack.storage.snapshot.XenserverSnapshotStrategy.backupSnapshot(XenserverSnapshotStrategy.java:138)
>         at org.apache.cloudstack.storage.snapshot.XenserverSnapshotStrategy.takeSnapshot(XenserverSnapshotStrategy.java:264)
>         at com.cloud.storage.snapshot.SnapshotManagerImpl.takeSnapshot(SnapshotManagerImpl.java:1013)
>         at org.apache.cloudstack.api.command.user.snapshot.CreateSnapshotCmd.execute(CreateSnapshotCmd.java:170)
> 2013-10-23 20:31:21,999 DEBUG [storage.volume.VolumeServiceImpl] (Job-Executor-49:job-91
= [ e2bf2454-4273-4a89-bc38-35add8297eb1 ]) Take snapshot: 18 failed
> com.cloud.utils.exception.CloudRuntimeException: Failed to create snapshot
>         at com.cloud.storage.snapshot.SnapshotManagerImpl.takeSnapshot(SnapshotManagerImpl.java:1040)
>         at org.apache.cloudstack.api.command.user.snapshot.CreateSnapshotCmd.execute(CreateSnapshotCmd.java:170)
> Caused by: com.cloud.utils.exception.CloudRuntimeException: org.libvirt.LibvirtException:
Domain snapshot not found: no domain snapshot with matching name '65113136-dfb5-4cea-8e65-1065462ca2fe'
>         at org.apache.cloudstack.storage.snapshot.SnapshotServiceImpl.backupSnapshot(SnapshotServiceImpl.java:280)
>         at org.apache.cloudstack.storage.snapshot.XenserverSnapshotStrategy.backupSnapshot(XenserverSnapshotStrategy.java:138)
>         at org.apache.cloudstack.storage.snapshot.XenserverSnapshotStrategy.takeSnapshot(XenserverSnapshotStrategy.java:264)
>         at com.cloud.storage.snapshot.SnapshotManagerImpl.takeSnapshot(SnapshotManagerImpl.java:1013)
> 2013-10-23 20:31:22,167 DEBUG [cloud.async.AsyncJobManagerImpl] (Job-Executor-49:job-91
= [ e2bf2454-4273-4a89-bc38-35add8297eb1 ]) Complete async job-91 = [ e2bf2454-4273-4a89-bc38-35add8297eb1
], jobStatus: 2, resultCode: 530, result: Error Code: 530 Error text: Failed to create snapshot
due to an internal error creating snapshot for volume 18
> 2013-10-23 20:31:24,709 DEBUG [agent.transport.Request] (AgentManager-Handler-13:null)
Seq 9-1437990929: Processing:  { Ans: , MgmtId: 161342718518, via: 9, Ver: v1, Flags: 110,
[{"org.apache.cloudstack.storage.command.CopyCmdAnswer":{"newData":{"org.apache.cloudstack.storage.to.SnapshotObjectTO":{"path":"snapshots/2/16/157016cb-5e57-428f-b747-5d9b628d2864","id":0}},"result":true,"wait":0}}]
}
> 2013-10-23 20:31:25,760 DEBUG [cloud.async.AsyncJobManagerImpl] (Job-Executor-51:job-93
= [ 25e157c0-f966-401e-9263-c42dac56e0c1 ]) Done executing org.apache.cloudstack.api.command.user.snapshot.CreateSnapshotCmd
for job-93 = [ 25e157c0-f966-401e-9263-c42dac56e0c1 ]
> 2013-10-23 20:32:57,416 DEBUG [agent.transport.Request] (AgentManager-Handler-8:null)
Seq 6-1170407435: Processing:  { Ans: , MgmtId: 161342718518, via: 6, Ver: v1, Flags: 110,
[{"org.apache.cloudstack.storage.command.CopyCmdAnswer":{"result":false,"details":"org.libvirt.LibvirtException:
Domain snapshot not found: no domain snapshot with matching name '909848a0-b3ec-4657-a53a-c449dc24365b'","wait":0}}]
}
> 2013-10-23 20:32:57,680 DEBUG [storage.snapshot.SnapshotManagerImpl] (Job-Executor-50:job-92
= [ b8bbb5be-54ba-43df-b429-5b5fb61416ad ]) Failed to create snapshot
> com.cloud.utils.exception.CloudRuntimeException: org.libvirt.LibvirtException: Domain
snapshot not found: no domain snapshot with matching name '909848a0-b3ec-4657-a53a-c449dc24365b'
>         at org.apache.cloudstack.storage.snapshot.SnapshotServiceImpl.backupSnapshot(SnapshotServiceImpl.java:280)
>         at org.apache.cloudstack.storage.snapshot.XenserverSnapshotStrategy.backupSnapshot(XenserverSnapshotStrategy.java:138)
>         at org.apache.cloudstack.storage.snapshot.XenserverSnapshotStrategy.takeSnapshot(XenserverSnapshotStrategy.java:264)
>         at com.cloud.storage.snapshot.SnapshotManagerImpl.takeSnapshot(SnapshotManagerImpl.java:1013)
>         at org.apache.cloudstack.api.command.user.snapshot.CreateSnapshotCmd.execute(CreateSnapshotCmd.java:170)
> 2013-10-23 20:32:57,763 DEBUG [storage.volume.VolumeServiceImpl] (Job-Executor-50:job-92
= [ b8bbb5be-54ba-43df-b429-5b5fb61416ad ]) Take snapshot: 17 failed
> com.cloud.utils.exception.CloudRuntimeException: Failed to create snapshot
>         at com.cloud.storage.snapshot.SnapshotManagerImpl.takeSnapshot(SnapshotManagerImpl.java:1040)
>         at org.apache.cloudstack.api.command.user.snapshot.CreateSnapshotCmd.execute(CreateSnapshotCmd.java:170)
> Caused by: com.cloud.utils.exception.CloudRuntimeException: org.libvirt.LibvirtException:
Domain snapshot not found: no domain snapshot with matching name '909848a0-b3ec-4657-a53a-c449dc24365b'
>         at org.apache.cloudstack.storage.snapshot.SnapshotServiceImpl.backupSnapshot(SnapshotServiceImpl.java:280)
>         at org.apache.cloudstack.storage.snapshot.XenserverSnapshotStrategy.backupSnapshot(XenserverSnapshotStrategy.java:138)
>         at org.apache.cloudstack.storage.snapshot.XenserverSnapshotStrategy.takeSnapshot(XenserverSnapshotStrategy.java:264)
>         at com.cloud.storage.snapshot.SnapshotManagerImpl.takeSnapshot(SnapshotManagerImpl.java:1013)
> 2013-10-23 20:32:57,849 DEBUG [cloud.async.AsyncJobManagerImpl] (Job-Executor-50:job-92
= [ b8bbb5be-54ba-43df-b429-5b5fb61416ad ]) Complete async job-92 = [ b8bbb5be-54ba-43df-b429-5b5fb61416ad
], jobStatus: 2, resultCode: 530, result: Error Code: 530 Error text: Failed to create snapshot
due to an internal error creating snapshot for volume 17
> 2013-10-23 20:33:50,627 DEBUG [storage.snapshot.SnapshotSchedulerImpl] (SnapshotPollTask:null)
Snapshot scheduler.poll is being called at 2013-10-23 17:33:50 GMT
> 2013-10-23 20:33:50,627 DEBUG [storage.snapshot.SnapshotSchedulerImpl] (SnapshotPollTask:null)
Got 0 snapshots to be executed at 2013-10-23 17:33:50 GMT
> 2013-10-23 20:34:28,514 DEBUG [agent.transport.Request] (AgentManager-Handler-3:null)
Seq 6-1170407437: Processing:  { Ans: , MgmtId: 161342718518, via: 6, Ver: v1, Flags: 110,
[{"org.apache.cloudstack.storage.command.CopyCmdAnswer":{"result":false,"details":"org.libvirt.LibvirtException:
Domain snapshot not found: no domain snapshot with matching name 'f4e51b11-ac79-4a6a-b887-8926ffbd5cca'","wait":0}}]
}
> 2013-10-23 20:34:28,779 DEBUG [storage.snapshot.SnapshotManagerImpl] (Job-Executor-52:job-94
= [ 42f8d6e0-762e-4f01-a7d5-daff2e31be13 ]) Failed to create snapshot
> com.cloud.utils.exception.CloudRuntimeException: org.libvirt.LibvirtException: Domain
snapshot not found: no domain snapshot with matching name 'f4e51b11-ac79-4a6a-b887-8926ffbd5cca'
>         at org.apache.cloudstack.storage.snapshot.SnapshotServiceImpl.backupSnapshot(SnapshotServiceImpl.java:280)
>         at org.apache.cloudstack.storage.snapshot.XenserverSnapshotStrategy.backupSnapshot(XenserverSnapshotStrategy.java:138)
>         at org.apache.cloudstack.storage.snapshot.XenserverSnapshotStrategy.takeSnapshot(XenserverSnapshotStrategy.java:264)
>         at com.cloud.storage.snapshot.SnapshotManagerImpl.takeSnapshot(SnapshotManagerImpl.java:1013)
>         at org.apache.cloudstack.api.command.user.snapshot.CreateSnapshotCmd.execute(CreateSnapshotCmd.java:170)
> 2013-10-23 20:34:28,870 DEBUG [storage.volume.VolumeServiceImpl] (Job-Executor-52:job-94
= [ 42f8d6e0-762e-4f01-a7d5-daff2e31be13 ]) Take snapshot: 14 failed
> com.cloud.utils.exception.CloudRuntimeException: Failed to create snapshot
>         at com.cloud.storage.snapshot.SnapshotManagerImpl.takeSnapshot(SnapshotManagerImpl.java:1040)
>         at org.apache.cloudstack.api.command.user.snapshot.CreateSnapshotCmd.execute(CreateSnapshotCmd.java:170)
> Caused by: com.cloud.utils.exception.CloudRuntimeException: org.libvirt.LibvirtException:
Domain snapshot not found: no domain snapshot with matching name 'f4e51b11-ac79-4a6a-b887-8926ffbd5cca'
>         at org.apache.cloudstack.storage.snapshot.SnapshotServiceImpl.backupSnapshot(SnapshotServiceImpl.java:280)
>         at org.apache.cloudstack.storage.snapshot.XenserverSnapshotStrategy.backupSnapshot(XenserverSnapshotStrategy.java:138)
>         at org.apache.cloudstack.storage.snapshot.XenserverSnapshotStrategy.takeSnapshot(XenserverSnapshotStrategy.java:264)
>         at com.cloud.storage.snapshot.SnapshotManagerImpl.takeSnapshot(SnapshotManagerImpl.java:1013)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message