brooklyn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aled Sage <aled.s...@gmail.com>
Subject Re: Booklyn fails to start
Date Tue, 01 Aug 2017 08:19:42 GMT
Hi Taylor,

Glad to hear you have a workaround.

I'm surprised that snapshotting while the service is running would cause 
a problem, from the Brooklyn perspective. When persisting to the 
filesystem, we are careful to ensure files are always in a valid state 
(e.g. write a tmp file and then do an atomic move to overwrite the 
previous file).

If you have more details of the problem you're seeing that would be 
useful. For example, what exception(s) are shown in the log on rebind? 
Can you share a copy of the persisted state for which rebind fails (but 
be careful sharing that - if you're not using "externalised 
configuration" [1] for credentials then the persisted state could 
contain your ssh key, cloud credentials, etc.)

Aled

[1] https://brooklyn.apache.org/v/latest/ops/externalized-configuration.html


On 31/07/2017 22:21, Taylor wrote:
> Thanks for the input about the external object store Duncan and Robert.
>
> I will be reviewing the options and testing them soon.
>
> I was able to reproduce the state corruption several times. I am not sure what the issue
is but the work around is to gracefully power down the vm hosting brooklyn and take a snapshot.
>
> I dont think I will look any further into this. The work around is acceptable and ultimately
I think moving to an object store is the best move.
>
> Thanks,
>
> Taylor
>
> ________________________________
> From: Duncan Grant <duncan.grant@cloudsoftcorp.com>
> Sent: Monday, July 31, 2017 10:55 AM
> To: Taylor; dev@brooklyn.apache.org
> Subject: Re: Booklyn fails to start
>
> Taylor,
>
> I'd say that persistence is fairly robust in brooklyn as it's heavily used and well tested.
>
> We use file-system based backup [1] in many cases and I haven't heard of anyone having
the problem you describe.  Which makes me think it has something to do with using snapshots.
 But that seems like it should be fine even in the case that brooklyn is writing to persistence
when you take the snapshot (otherwise I'd expect a number of processes to be corrupted every
time you took a snapshot). I'd be interested whether you see the same issue when you use the
back approach described here [2].
>
> Having said that, in most cases I recommend using an object store (e.g. s3) for persistence.
 This should make persistence more reliable, allows you to make use of versioning, is necessary
for running brooklyn in HA, etc.
>
> For debugging issues with persistence I think that brooklyn itself is the best option.
 Using the rebind.failureMode.rebind=fail_at_end [3] option in brooklyn.properties and then
examining the log output usually makes it clear where something has gone wrong.  You can fairly
easily edit the persistence files with a text editor as well as they are human readable and
are basically entities, state, and relationships.
>
> Regards
>
> Duncan
>
> [1] https://brooklyn.apache.org/v/latest/ops/persistence/index.html#persisted-state-backup
> Persistence - Apache Brooklyn<https://brooklyn.apache.org/v/latest/ops/persistence/index.html#persisted-state-backup>
> brooklyn.apache.org
> Persistence. Brooklyn can be configured to persist its state so that the Brooklyn server
can be restarted, or so that a high availability standby server can take over.
>
>
> [2] https://brooklyn.apache.org/v/latest/ops/persistence/index.html#file-system-backup
> Persistence - Apache Brooklyn<https://brooklyn.apache.org/v/latest/ops/persistence/index.html#file-system-backup>
> brooklyn.apache.org
> Persistence. Brooklyn can be configured to persist its state so that the Brooklyn server
can be restarted, or so that a high availability standby server can take over.
>
>
> [3] https://brooklyn.apache.org/v/latest/ops/persistence/index.html#ignore-errors
> Persistence - Apache Brooklyn<https://brooklyn.apache.org/v/latest/ops/persistence/index.html#ignore-errors>
> brooklyn.apache.org
> Persistence. Brooklyn can be configured to persist its state so that the Brooklyn server
can be restarted, or so that a high availability standby server can take over.
>
>
>
> On Mon, 31 Jul 2017 at 15:04 Taylor <tschneider@live.com<mailto:tschneider@live.com>>
wrote:
>
>
> Duncan,
>
> Thanks so much for the thorough response!
>
> I will be reviewing the links you sent today.
>
> With respect to the snapshot: I am running brooklyn on a CentOS VM hosted on XenServer.
Since the original email I have been experimenting with snapshots to try and diagnose what
the issue is. The only way I can take a snapshot and revert is if I stop the service and power
off the vm before taking a snapshot (disk only, no memory). If I take the snapshot while the
service is running or a after the service is stopped the persisted state will get corrupted.
>
> This has me worried for the case of a production outage.
>
> Are there any tools to aid in fixing the persisted state manually?
>
> What mechanism is safe for backing up the persisted state? Can I backup while the service
is running?
>
> Thanks,
>
> Taylor
>
> ________________________________
> From: Duncan Grant <duncan.grant@cloudsoftcorp.com<mailto:duncan.grant@cloudsoftcorp.com>>
> Sent: Monday, July 31, 2017 3:31 AM
> To: dev@brooklyn.apache.org<mailto:dev@brooklyn.apache.org>
> Cc: Taylor
> Subject: Re: Booklyn fails to start
>
> Taylor,
>
> The error you're seeing is with Brooklyn failing to rebind to persisted state [1].  Could
you explain what you mean when you are talking about taking a snapshot and then reverting
to the snapshot (do you mean the VM image where you are running brooklyn?)
>
> There are a couple of ways to deal with problems with persisted state.  You can either
fix the persisted state manually[2] or you can have brooklyn ignore errors with persisted
state when it starts [4].  Both of these run the risk of brooklyn becoming detached from existing
applications so back up your persistance directory (or object store) first.
>
> Let me know if this helps (or doesn't) or I'm on IRC just now if you'd like some answers
in real-time.
>
> Regards
>
> Duncan
>
> [1]https://brooklyn.apache.org/v/latest/ops/persistence/index.html#rebinding-to-state
> Persistence - Apache Brooklyn<https://brooklyn.apache.org/v/latest/ops/persistence/index.html#rebinding-to-state>
> brooklyn.apache.org<http://brooklyn.apache.org>
> Persistence. Brooklyn can be configured to persist its state so that the Brooklyn server
can be restarted, or so that a high availability standby server can take over.
>
>
>
> [2]https://brooklyn.apache.org/v/latest/ops/persistence/index.html#determine-underlying-cause
> Persistence - Apache Brooklyn<https://brooklyn.apache.org/v/latest/ops/persistence/index.html#determine-underlying-cause>
> brooklyn.apache.org<http://brooklyn.apache.org>
> Persistence. Brooklyn can be configured to persist its state so that the Brooklyn server
can be restarted, or so that a high availability standby server can take over.
>
>
>
> [3]https://brooklyn.apache.org/v/latest/ops/persistence/index.html#fix-up-the-state
> Persistence - Apache Brooklyn<https://brooklyn.apache.org/v/latest/ops/persistence/index.html#fix-up-the-state>
> brooklyn.apache.org<http://brooklyn.apache.org>
> Persistence. Brooklyn can be configured to persist its state so that the Brooklyn server
can be restarted, or so that a high availability standby server can take over.
>
>
>
> [4]https://brooklyn.apache.org/v/latest/ops/persistence/index.html#ignore-errors
> Persistence - Apache Brooklyn<https://brooklyn.apache.org/v/latest/ops/persistence/index.html#ignore-errors>
> brooklyn.apache.org<http://brooklyn.apache.org>
> Persistence. Brooklyn can be configured to persist its state so that the Brooklyn server
can be restarted, or so that a high availability standby server can take over.
>
>
>
>
>
> On Mon, 31 Jul 2017 at 07:57 Taylor <tschneider@live.com<mailto:tschneider@live.com>>
wrote:
> I am having a problem with brooklyn. If I start/stop the service things are ok. If I
snapshot and revertto snapshot I see the following:
>
>
> [root@localhost ~]# systemctl status brooklyn
> brooklyn.service - Apache Brooklyn Service
>     Loaded: loaded (/etc/systemd/system/multi-user.target.wants/brooklyn.service)
>     Active: active (running) since Sun 2017-07-30 17:19:55 EDT; 44s ago
>       Docs: https://brooklyn.apache.org/documentation/index.html
>   Main PID: 651 (java)
>     CGroup: /system.slice/brooklyn.service
>             └─651 /usr/bin/java -Dbrooklyn.location.localhost.address=127.0.0.1 -XX:SoftRefLRUPolicyMSPerMB=1
-Dlogback.configurationFile=/etc/brooklyn/logback.xml -Xms256m -Xmx1g -XX:MaxP...
>
> Jul 30 17:20:11 localhost.localdomain java[651]: 2017-07-30 17:20:11,553 INFO  Started
Brooklyn console at http://127.0.0.1:8081/, running classpath://brooklyn.war@/
> Jul 30 17:20:13 localhost.localdomain java[651]: 2017-07-30 17:20:13,401 INFO  Geo info
lookup for 127.0.0.1/127.0.0.1<http://127.0.0.1/127.0.0.1> returned: HostGeoInfo[RCN
Corporation, Chicago (US): 127...4096374512<tel:(409)%20637-4512>)]
> Jul 30 17:20:13 localhost.localdomain java[651]: 2017-07-30 17:20:13,736 ERROR Subsystem
for persistence had startup error (continuing with startup): java.lang.IllegalStateExc...was
scanning
> Jul 30 17:20:13 localhost.localdomain java[651]: java.lang.IllegalStateException: Node
record nodes/vmL5HEpG could not be read when upxGnvJq was scanning
> Jul 30 17:20:13 localhost.localdomain java[651]: at org.apache.brooklyn.core.mgmt.ha.ManagementPlaneSyncRecordPersisterToObjectStore.loadSyncRecord(ManagementPlaneSyncRecordPe....jar:0.11.0]
> Jul 30 17:20:13 localhost.localdomain java[651]: 2017-07-30 17:20:13,736 WARN  Loading
catalog for INITIALIZING as part of launch sequence (it was not loaded as part of the rebind
sequence)
> Jul 30 17:20:18 localhost.localdomain java[651]: 2017-07-30 17:20:18,851 INFO  Launched
Brooklyn; will now block until shutdown command received via GUI/API (recommended) or p...s
interrupt.
> Jul 30 17:20:28 localhost.localdomain java[651]: 2017-07-30 17:20:28,309 WARN  Disallowing
web request as server not in required HA hot state: http://192.168.1.14:8081/v1/catalog/applicat...
> Jul 30 17:20:28 localhost.localdomain java[651]: 2017-07-30 17:20:28,309 WARN  Disallowing
web request as server not in required HA hot state: http://192.168.1.14:8081/v1/loca...s'
to force)
> Jul 30 17:20:28 localhost.localdomain java[651]: 2017-07-30 17:20:28,309 WARN  Disallowing
web request as server not in required HA hot state: http://192.168.1.14:8081/v1/catalog/entities...
> Hint: Some lines were ellipsized, use -l to show in full.
>
>
>


Mime
View raw message