cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wido den Hollander <w...@widodh.nl>
Subject Re: Orphaned libvirt storage pools
Date Wed, 12 Jun 2013 16:48:32 GMT
Hi Wei,

This was with both 0.9.8 as with 1.0.2

Haven't been able to dig into this deeper yet.

Wido

On 06/12/2013 06:26 PM, Wei ZHOU wrote:
> Wido,
>
> Could you tell me the libvirt version?
> For our platform with this issue, the libvirt version is 0.9.13
>
> -Wei
>
>
> 2013/6/7 Marcus Sorensen <shadowsor@gmail.com>
>
>> There is already quite a bit of logging around this stuff, for example:
>>
>>                  s_logger.error("deleteStoragePool removed pool from
>> libvirt, but libvirt had trouble"
>>                                 + "unmounting the pool. Trying umount
>> location " + targetPath
>>                                 + "again in a few seconds");
>>
>> And if it gets an error from libvirt during create stating that the
>> mountpoint is in use, agent attempts to unmount before remounting. Of
>> course this would fail if it is in use.
>>
>>              // if error is that pool is mounted, try to handle it
>>              if (e.toString().contains("already mounted")) {
>>                  s_logger.error("Attempting to unmount old mount
>> libvirt is unaware of at "+targetPath);
>>                  String result = Script.runSimpleBashScript("umount " +
>> targetPath );
>>                  if (result == null) {
>>                      s_logger.error("Succeeded in unmounting " +
>> targetPath);
>>                      try {
>>                          sp = conn.storagePoolCreateXML(spd.toString(), 0);
>>                          s_logger.error("Succeeded in redefining storage");
>>                          return sp;
>>                      } catch (LibvirtException l) {
>>                          s_logger.error("Target was already mounted,
>> unmounted it but failed to redefine storage:" + l);
>>                      }
>>                  } else {
>>                      s_logger.error("Failed in unmounting and
>> redefining storage");
>>                  }
>>              }
>>
>>
>> Do you think it was related to the upgrade process itself (e.g. maybe
>> the storage pools didn't carry across the libvirt upgrade)? Can you
>> duplicate outside of the upgrade?
>>
>> On Fri, Jun 7, 2013 at 8:43 AM, Wido den Hollander <wido@widodh.nl> wrote:
>>> Hi,
>>>
>>>
>>> On 06/07/2013 04:30 PM, Marcus Sorensen wrote:
>>>>
>>>> Does this only happen with isos?
>>>
>>>
>>> Yes, it does.
>>>
>>> My work-around for now was to locate all the Instances who had these ISOs
>>> attached and detach them from all (~100 instances..)
>>>
>>> Then I manually unmounted all the mountpoints under /mnt so that they
>> can be
>>> re-used again.
>>>
>>> This cluster was upgraded to 4.1 from 4.0 with libvirt 1.0.2 (coming from
>>> 0.9.8).
>>>
>>> Somehow libvirt forgot about these storage pools.
>>>
>>> Wido
>>>
>>>> On Jun 7, 2013 8:15 AM, "Wido den Hollander" <wido@widodh.nl> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> So, I just created CLOUDSTACK-2893, but Wei Zhou mentioned that there
>> are
>>>>> some related issues:
>>>>> * CLOUDSTACK-2729
>>>>> * CLOUDSTACK-2780
>>>>>
>>>>> I restarted my Agent and the issue described in 2893 went away, but I'm
>>>>> wondering how that happened.
>>>>>
>>>>> Anyway, after going further I found that I have some "orphaned" storage
>>>>> pools, with that I mean, they are mounted and in use, but not defined
>> nor
>>>>> active in libvirt:
>>>>>
>>>>> root@n02:~# lsof |grep "\.iso"|awk '{print $9}'|cut -d '/' -f 3|sort
>>>>> -n|uniq
>>>>> eb3cd8fd-a462-35b9-882a-**f4b9f2f4a84c
>>>>> f84e51ab-d203-3114-b581-**247b81b7d2c1
>>>>> fd968b03-bd11-3179-a2b3-**73def7c66c68
>>>>> 7ceb73e5-5ab1-3862-ad6e-**52cb986aff0d
>>>>> 7dc0149e-0281-3353-91eb-**4589ef2b1ec1
>>>>> 8e005344-6a65-3802-ab36-**31befc95abf3
>>>>> 88ddd8f5-e6c7-3f3d-bef2-**eea8f33aa593
>>>>> 765e63d7-e9f9-3203-bf4f-**e55f83fe9177
>>>>> 1287a27d-0383-3f5a-84aa-**61211621d451
>>>>> 98622150-41b2-3ba3-9c9c-**09e3b6a2da03
>>>>>
>>>>> root@n02:~#
>>>>>
>>>>> Looking at libvirt:
>>>>> root@n02:~# virsh pool-list
>>>>> Name                 State      Autostart
>>>>> ------------------------------**-----------
>>>>> 52801816-fe44-3a2b-a147-**bb768eeea295 active     no
>>>>> 7ceb73e5-5ab1-3862-ad6e-**52cb986aff0d active     no
>>>>> 88ddd8f5-e6c7-3f3d-bef2-**eea8f33aa593 active     no
>>>>> a83d1100-4ffa-432a-8467-**4dc266c4b0c8 active     no
>>>>> fd968b03-bd11-3179-a2b3-**73def7c66c68 active     no
>>>>>
>>>>>
>>>>> root@n02:~#
>>>>>
>>>>> What happens here is that the mountpoints are in use (ISO attached to
>>>>> Instance) but there is no storage pool in libvirt.
>>>>>
>>>>> This means that when you try to deploy a second VM with the same ISO
>>>>> libvirt will error out since the Agent will try to create and start a
>> new
>>>>> storage pool which will fail since the mountpoint is already in use.
>>>>>
>>>>> The remedy would be to take the hypervisor into maintainence, reboot
>> int
>>>>> completely and migrate Instances to it again.
>>>>>
>>>>> In libvirt there is no way to start a NFS storage pool without libvirt
>>>>> mounting it.
>>>>>
>>>>> Any suggestions on how we can work around this code wise?
>>>>>
>>>>> For my issue I'm writing a patch which adds some more debug lines to
>> show
>>>>> what the Agent is doing, but it's kind of weird that we got into this
>>>>> "disconnected" state.
>>>>>
>>>>> Wido
>>>>>
>>>>
>>>
>>
>

Mime
View raw message