cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrei Mikhailovsky <and...@arhont.com.INVALID>
Subject Re: 4.13 rbd snapshot delete failed
Date Mon, 09 Sep 2019 12:40:55 GMT
A quick feedback from my side. I've never had a properly working delete snapshot with ceph.
Every week or so I have to manually delete all ceph snapshots. However, the NFS secondary
storage snapshots are deleted just fine. I've been using CloudStack for 5+ years and it was
always the case. I am currently running 4.11.2 with ceph 13.2.6-1xenial.

Andrei

----- Original Message -----
> From: "Andrija Panic" <andrija.panic@gmail.com>
> To: "Gabriel Beims Bräscher" <gabrascher@gmail.com>
> Cc: "users" <users@cloudstack.apache.org>, "dev" <dev@cloudstack.apache.org>
> Sent: Sunday, 8 September, 2019 19:17:59
> Subject: Re: 4.13 rbd snapshot delete failed

> Thx Gabriel for extensive feedback.
> Actually my ex company added the code to really delete a RBD snap back in
> 2016 or so, was part of 4.9 if not mistaken. So I expect the code is there,
> but probably some exception is happening or regression...
> 
> Cheers
> 
> On Sun, Sep 8, 2019, 09:31 Gabriel Beims Bräscher <gabrascher@gmail.com>
> wrote:
> 
>> Thanks for the feedback, Andrija. It looks like delete was not totally
>> supported then (am I missing something?). I will take a look into this and
>> open a PR adding propper support for rbd snapshot deletion if necessary.
>>
>> Regarding the rollback, I have tested it several times and it worked;
>> however, I see a weak point on the Ceph rollback implementation.
>>
>> It looks like Li Jerry was able to execute the rollback without any
>> problem. Li, could you please post here  the log output: "Attempting to
>> rollback RBD snapshot [name:%s], [pool:%s], [volumeid:%s],
>> [snapshotid:%s]"? Andrija will not be able to see that log as the exception
>> happen prior to it, the only way of you checking those values is via remote
>> debugging. If you be able to post those values it would help as well on
>> sorting out what is wrong.
>>
>> I am checking the code base, running a few tests, and evaluating the log
>> that you (Andrija) sent. What I can say for now is that it looks that the
>> parameter "snapshotRelPath = snapshot.getPath()" [1] is a critical piece of
>> code that can definitely break the rollback execution flow. My tests had
>> pointed for a pattern but now I see other possibilities. I will probably
>> add a few parameters on the rollback/revert command instead of using the
>> path or review the path life-cycle and different execution flows in order
>> to keep it safer to be used.
>> [1]
>> https://github.com/apache/cloudstack/blob/50fc045f366bd9769eba85c4bc3ecdc0b7035c11/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper
>>
>> A few details on the test environments and Ceph/RBD version:
>> CloudStack, KVM, and Ceph nodes are running with Ubuntu 18.04
>> Ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic
>> (stable)
>> RADOS Block Devices has snapshot rollback support since Ceph v10.0.2 [
>> https://github.com/ceph/ceph/pull/6878]
>> Rados-java [https://github.com/ceph/rados-java] supports snapshot
>> rollback since 0.5.0; rados-java 0.5.0 is the version used by CloudStack
>> 4.13.0.0
>>
>> I will be updating here soon.
>>
>> Em dom, 8 de set de 2019 às 12:28, Wido den Hollander <wido@widodh.nl>
>> escreveu:
>>
>>>
>>>
>>> On 9/8/19 5:26 AM, Andrija Panic wrote:
>>> > Maaany release ago, deleting Ceph volume snap, was also only deleting
>>> it in
>>> > DB, so the RBD performance become terrible with many tens of (i. e.
>>> Hourly)
>>> > snapshots. I'll try to verify this on 4.13 myself, but Wido and the guys
>>> > will know better...
>>>
>>> I pinged Gabriel and he's looking into it. He'll get back to it.
>>>
>>> Wido
>>>
>>> >
>>> > I
>>> >
>>> > On Sat, Sep 7, 2019, 08:34 li jerry <div8cn@hotmail.com> wrote:
>>> >
>>> >> I found it had nothing to do with  storage.cleanup.delay and
>>> >> storage.cleanup.interval.
>>> >>
>>> >>
>>> >>
>>> >> The reason is that when DeleteSnapshot Cmd is executed, because the
RBD
>>> >> snapshot does not have Copy to secondary storage, it only changes the
>>> >> database information, and does not enter the main storage to delete
the
>>> >> snapshot.
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> Log===========================
>>> >>
>>> >>
>>> >>
>>> >> 2019-09-07 23:27:00,118 DEBUG [c.c.a.ApiServlet]
>>> >> (qtp504527234-17:ctx-2e407b61) (logid:445cbea8) ===START===
>>> 192.168.254.3
>>> >> -- GET
>>> >>
>>> command=deleteSnapshot&id=0b50eb7e-4f42-4de7-96c2-1fae137c8c9f&response=json&_=1567869534480
>>> >>
>>> >> 2019-09-07 23:27:00,139 DEBUG [c.c.a.ApiServer]
>>> >> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8) CIDRs from
>>> >> which account 'Acct[2f96c108-9408-11e9-a820-0200582b001a-admin]' is
>>> allowed
>>> >> to perform API calls: 0.0.0.0/0,::/0
>>> >>
>>> >> 2019-09-07 23:27:00,204 DEBUG [c.c.a.ApiServer]
>>> >> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8) Retrieved
>>> >> cmdEventType from job info: SNAPSHOT.DELETE
>>> >>
>>> >> 2019-09-07 23:27:00,217 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
>>> >> (API-Job-Executor-2:ctx-f0843047 job-1378) (logid:c34a368a) Add
>>> job-1378
>>> >> into job monitoring
>>> >>
>>> >> 2019-09-07 23:27:00,219 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>>> >> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8) submit
>>> async
>>> >> job-1378, details: AsyncJobVO {id:1378, userId: 2, accountId: 2,
>>> >> instanceType: Snapshot, instanceId: 13, cmd:
>>> >> org.apache.cloudstack.api.command.user.snapshot.DeleteSnapshotCmd,
>>> cmdInfo:
>>> >>
>>> {"response":"json","ctxUserId":"2","httpmethod":"GET","ctxStartEventId":"1237","id":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","ctxDetails":"{\"interface
>>> >>
>>> com.cloud.storage.Snapshot\":\"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f\"}","ctxAccountId":"2","uuid":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","cmdEventType":"SNAPSHOT.DELETE","_":"1567869534480"},
>>> >> cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0,
>>> >> result: null, initMsid: 2200502468634, completeMsid: null, lastUpdated:
>>> >> null, lastPolled: null, created: null, removed: null}
>>> >>
>>> >> 2019-09-07 23:27:00,220 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>>> >> (API-Job-Executor-2:ctx-f0843047 job-1378) (logid:1cee5097) Executing
>>> >> AsyncJobVO {id:1378, userId: 2, accountId: 2, instanceType: Snapshot,
>>> >> instanceId: 13, cmd:
>>> >> org.apache.cloudstack.api.command.user.snapshot.DeleteSnapshotCmd,
>>> cmdInfo:
>>> >>
>>> {"response":"json","ctxUserId":"2","httpmethod":"GET","ctxStartEventId":"1237","id":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","ctxDetails":"{\"interface
>>> >>
>>> com.cloud.storage.Snapshot\":\"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f\"}","ctxAccountId":"2","uuid":"0b50eb7e-4f42-4de7-96c2-1fae137c8c9f","cmdEventType":"SNAPSHOT.DELETE","_":"1567869534480"},
>>> >> cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0,
>>> >> result: null, initMsid: 2200502468634, completeMsid: null, lastUpdated:
>>> >> null, lastPolled: null, created: null, removed: null}
>>> >>
>>> >> 2019-09-07 23:27:00,221 DEBUG [c.c.a.ApiServlet]
>>> >> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8) ===END===
>>> >> 192.168.254.3 -- GET
>>> >>
>>> command=deleteSnapshot&id=0b50eb7e-4f42-4de7-96c2-1fae137c8c9f&response=json&_=1567869534480
>>> >>
>>> >> 2019-09-07 23:27:00,305 DEBUG [c.c.a.m.ClusteredAgentAttache]
>>> >> (AgentManager-Handler-12:null) (logid:) Seq 1-8660140608456756853:
>>> Routing
>>> >> from 2199066247173
>>> >>
>>> >> 2019-09-07 23:27:00,305 DEBUG [o.a.c.s.s.XenserverSnapshotStrategy]
>>> >> (API-Job-Executor-2:ctx-f0843047 job-1378 ctx-f50e25a4)
>>> (logid:1cee5097)
>>> >> Can't find snapshot on backup storage, delete it in db
>>> >>
>>> >>
>>> >>
>>> >> -Jerry
>>> >>
>>> >>
>>> >>
>>> >> ________________________________
>>> >> 发件人: Andrija Panic <andrija.panic@gmail.com>
>>> >> 发送时间: Saturday, September 7, 2019 1:07:19 AM
>>> >> 收件人: users <users@cloudstack.apache.org>
>>> >> 抄送: dev@cloudstack.apache.org <dev@cloudstack.apache.org>
>>> >> 主题: Re: 4.13 rbd snapshot delete failed
>>> >>
>>> >> storage.cleanup.delay
>>> >> storage.cleanup.interval
>>> >>
>>> >> put both to 60 (seconds) and wait for up to 2min - should be deleted
>>> just
>>> >> fine...
>>> >>
>>> >> cheers
>>> >>
>>> >> On Fri, 6 Sep 2019 at 18:52, li jerry <div8cn@hotmail.com> wrote:
>>> >>
>>> >>> Hello All
>>> >>>
>>> >>> When I tested ACS 4.13 KVM + CEPH snapshot, I found that snapshots
>>> could
>>> >>> be created and rolled back (using API alone), but deletion could
not
>>> be
>>> >>> completed.
>>> >>>
>>> >>>
>>> >>>
>>> >>> After executing the deletion API, the snapshot will disappear from
the
>>> >>> list Snapshots, but the snapshot on CEPH RBD will not be deleted
(rbd
>>> >> snap
>>> >>> list rbd/ac510428-5d09-4e86-9d34-9dfab3715b7c)
>>> >>>
>>> >>>
>>> >>>
>>> >>> Is there any way we can completely delete the snapshot?
>>> >>>
>>> >>> -Jerry
>>> >>>
>>> >>>
>>> >>
>>> >> --
>>> >>
>>> >> Andrija Panić
>>> >>
>>> >
>>>

Mime
View raw message