incubator-cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brenn Oosterbaan" <boosterb...@schubergphilis.com>
Subject Review Request: reboot when storage has been unavailable for x amount of second
Date Mon, 25 Feb 2013 13:42:54 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9586/
-----------------------------------------------------------

Review request for cloudstack and Hugo Trippaers.


Description
-------

Previously the timeout value was a sleep between checks. Now it is used as a timeout for the
amount of seconds the storage must be unavailable before reboot.
The script defaults to 1 check every 10 seconds (until the timeout value has been met). This
interval can also be supplied as a parameter if a different value is needed.


Diffs
-----

  scripts/vm/hypervisor/xenserver/xenheartbeat.sh 9cf2afe 

Diff: https://reviews.apache.org/r/9586/diff/


Testing
-------

Testing done on Xenserver host hostxxx, with timeout set to 120 seconds

hostxxx messages log:
Feb 25 14:07:50 hostxxx heartbeat: Potential problem with /var/run/sr-mount/d392d770-330b-bdbf-9c07-e1c38af81c6e/hb-8457c010-7943-43c6-aa57-d7a8d3aaf457:
not reachable since 10 seconds
Feb 25 14:08:01 hostxxx heartbeat: Potential problem with /var/run/sr-mount/d392d770-330b-bdbf-9c07-e1c38af81c6e/hb-8457c010-7943-43c6-aa57-d7a8d3aaf457:
not reachable since 20 seconds
Feb 25 14:08:11 hostxxx heartbeat: Potential problem with /var/run/sr-mount/d392d770-330b-bdbf-9c07-e1c38af81c6e/hb-8457c010-7943-43c6-aa57-d7a8d3aaf457:
not reachable since 30 seconds
Feb 25 14:08:21 hostxxx heartbeat: Potential problem with /var/run/sr-mount/d392d770-330b-bdbf-9c07-e1c38af81c6e/hb-8457c010-7943-43c6-aa57-d7a8d3aaf457:
not reachable since 40 seconds
Feb 25 14:08:31 hostxxx heartbeat: Potential problem with /var/run/sr-mount/d392d770-330b-bdbf-9c07-e1c38af81c6e/hb-8457c010-7943-43c6-aa57-d7a8d3aaf457:
not reachable since 50 seconds
Feb 25 14:08:41 hostxxx heartbeat: Potential problem with /var/run/sr-mount/d392d770-330b-bdbf-9c07-e1c38af81c6e/hb-8457c010-7943-43c6-aa57-d7a8d3aaf457:
not reachable since 60 seconds
Feb 25 14:08:51 hostxxx heartbeat: Potential problem with /var/run/sr-mount/d392d770-330b-bdbf-9c07-e1c38af81c6e/hb-8457c010-7943-43c6-aa57-d7a8d3aaf457:
not reachable since 70 seconds
Feb 25 14:09:01 hostxxx heartbeat: Potential problem with /var/run/sr-mount/d392d770-330b-bdbf-9c07-e1c38af81c6e/hb-8457c010-7943-43c6-aa57-d7a8d3aaf457:
not reachable since 80 seconds
Feb 25 14:09:11 hostxxx heartbeat: Potential problem with /var/run/sr-mount/d392d770-330b-bdbf-9c07-e1c38af81c6e/hb-8457c010-7943-43c6-aa57-d7a8d3aaf457:
not reachable since 90 seconds
Added write rights again and heartbeat worked fine again. Storage was only gone for 90 seconds
so system did not reboot.

Removed write rights again.
Feb 25 14:11:01 hostxxx heartbeat: Potential problem with /var/run/sr-mount/d392d770-330b-bdbf-9c07-e1c38af81c6e/hb-8457c010-7943-43c6-aa57-d7a8d3aaf457:
not reachable since 10 seconds
Feb 25 14:11:11 hostxxx heartbeat: Potential problem with /var/run/sr-mount/d392d770-330b-bdbf-9c07-e1c38af81c6e/hb-8457c010-7943-43c6-aa57-d7a8d3aaf457:
not reachable since 20 seconds
Feb 25 14:11:21 hostxxx heartbeat: Potential problem with /var/run/sr-mount/d392d770-330b-bdbf-9c07-e1c38af81c6e/hb-8457c010-7943-43c6-aa57-d7a8d3aaf457:
not reachable since 30 seconds
Feb 25 14:11:31 hostxxx heartbeat: Potential problem with /var/run/sr-mount/d392d770-330b-bdbf-9c07-e1c38af81c6e/hb-8457c010-7943-43c6-aa57-d7a8d3aaf457:
not reachable since 40 seconds
Feb 25 14:11:41 hostxxx heartbeat: Potential problem with /var/run/sr-mount/d392d770-330b-bdbf-9c07-e1c38af81c6e/hb-8457c010-7943-43c6-aa57-d7a8d3aaf457:
not reachable since 50 seconds
Feb 25 14:11:51 hostxxx heartbeat: Potential problem with /var/run/sr-mount/d392d770-330b-bdbf-9c07-e1c38af81c6e/hb-8457c010-7943-43c6-aa57-d7a8d3aaf457:
not reachable since 60 seconds
Feb 25 14:12:01 hostxxx heartbeat: Potential problem with /var/run/sr-mount/d392d770-330b-bdbf-9c07-e1c38af81c6e/hb-8457c010-7943-43c6-aa57-d7a8d3aaf457:
not reachable since 70 seconds
Feb 25 14:12:11 hostxxx heartbeat: Potential problem with /var/run/sr-mount/d392d770-330b-bdbf-9c07-e1c38af81c6e/hb-8457c010-7943-43c6-aa57-d7a8d3aaf457:
not reachable since 80 seconds
Feb 25 14:12:21 hostxxx heartbeat: Potential problem with /var/run/sr-mount/d392d770-330b-bdbf-9c07-e1c38af81c6e/hb-8457c010-7943-43c6-aa57-d7a8d3aaf457:
not reachable since 90 seconds
Feb 25 14:12:31 hostxxx heartbeat: Potential problem with /var/run/sr-mount/d392d770-330b-bdbf-9c07-e1c38af81c6e/hb-8457c010-7943-43c6-aa57-d7a8d3aaf457:
not reachable since 100 seconds
Feb 25 14:12:41 hostxxx heartbeat: Potential problem with /var/run/sr-mount/d392d770-330b-bdbf-9c07-e1c38af81c6e/hb-8457c010-7943-43c6-aa57-d7a8d3aaf457:
not reachable since 110 seconds
Feb 25 14:12:51 hostxxx heartbeat: Potential problem with /var/run/sr-mount/d392d770-330b-bdbf-9c07-e1c38af81c6e/hb-8457c010-7943-43c6-aa57-d7a8d3aaf457:
not reachable since 120 seconds
Feb 25 14:12:51 hostxxx heartbeat: Problem with /var/run/sr-mount/d392d770-330b-bdbf-9c07-e1c38af81c6e/hb-8457c010-7943-43c6-aa57-d7a8d3aaf457:
not reachable for 120 seconds, rebooting system!
Storage was gone for 120 seconds so the system rebooted.


Thanks,

Brenn Oosterbaan


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message