brooklyn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Svetoslav Neykov <svetoslav.ney...@cloudsoftcorp.com>
Subject Re: Unsuccessful cassandra deployment using yaml from blueprint-libary
Date Tue, 21 Apr 2015 09:20:10 GMT
Hi Jade,
In cases like this it's useful to check the logs on the machine. "Timeout waiting for SERVICE_UP
" means that Brooklyn doesn't see the process running.
The output of the cassandra process is kept at "cassandra-console.log" in the runtime directory
(see run.dir in Sensors tab for any of the cluster nodes). Also you can check if the process
is still running on the machine. As Alex suggested check if a single CassandraNode works.
You can stop by our IRC channel #brooklyncentral IRC on FreeNode for more help on troubleshooting
this.

Best,
Svet.

> On 19.04.2015 г., at 5:42, jade mackay <jademackay@gmail.com> wrote:
> 
> Hi Alex,
> 
> The cluster was shutting down because it catches fire and I was launching
> it from the command line rather than the web console.
> When launched from the web console the cluster persists and data propagates
> over the nodes, despite being on fire with all nodes quarantined.
> Incidentally, I can use the cluster effector expand the cluster but not
> shrink it.
> 
> The top level "cassandra-cluster-app" summary:
> 
> Required entity not healthy: CassandraClusterImpl{id=DUHV0IoT}
> *Failure running task invoking start[locations] on 1 node (FPi4GFmk)
> <http://localhost:8081/#/v1/applications/BiAPu1MO/entities/BiAPu1MO/activities/subtask/FPi4GFmk>:
> *Error
> invoking start at CassandraClusterImpl{id=DUHV0IoT}: Node in cluster
> CassandraClusterImpl{id=DUHV0IoT} failed: 2 of 2 parallel child tasks
> failed, 2 errors including: Error invoking start at
> CassandraNodeImpl{id=Ewkzxyc8}: Timeout waiting for SERVICE_UP from
> CassandraNodeImpl{id=Ewkzxyc8}
> 
> and the "Cassandra Cluster" summary:
> 
> start failed with error: java.lang.IllegalStateException: Node in cluster
> CassandraClusterImpl{id=DUHV0IoT} failed: 2 of 2 parallel child tasks
> failed, 2 errors including: Error invoking start at
> CassandraNodeImpl{id=Ewkzxyc8}: Timeout waiting for SERVICE_UP from
> CassandraNodeImpl{id=Ewkzxyc8}
> *Failure running task starting 2 nodes (parallel) (jmlBKmM8)
> <http://localhost:8081/#/v1/applications/BiAPu1MO/entities/DUHV0IoT/activities/subtask/jmlBKmM8>:
> *2
> of 2 parallel child tasks failed, 2 errors including: Error invoking start
> at CassandraNodeImpl{id=Ewkzxyc8}: Timeout waiting for SERVICE_UP from
> CassandraNodeImpl{id=Ewkzxyc8}
> 
> And one of the (aptly named) burning nodes "CassandraNode:D0wG":
> 
> The software process for this entity does not appear to be running
> *Failure running task post-start (qBmoHFo2)
> <http://localhost:8081/#/v1/applications/BiAPu1MO/entities/D0wGMtHJ/activities/subtask/qBmoHFo2>:
> *
> 
> For reference the blueprint:
> name: cassandra-cluster-app
> services:
> - type: brooklyn.entity.nosql.cassandra.CassandraCluster
>  name: Cassandra Cluster
>  brooklyn.config:
>    cluster.initial.size: 2
>    cluster.initial.quorumSize: 1
>    provisioning.properties:
>      minCores: 1
>      minRam: 512
> location: aws-oregon
> 
> Again, any tips on hunting the issue down would be appreciated.
> 
> Cheers,
> Jade
> 
> 
> 
> 
> 
> 
> On 18 April 2015 at 23:30, Alex Heneveld <alex.heneveld@cloudsoftcorp.com>
> wrote:
> 
>> 
>> Hi Jade,
>> 
>> Yes, this is the right place for your question.  Getting the Cassandra
>> start-up sequence took some work, especially in different clouds with
>> different notions of public and private networks, but this was hammered out
>> a while ago and it has been pretty reliable since then, including in AWS, I
>> thought.  Some questions and idea...
>> 
>> Does a single CassandraNode work?
>> 
>> The other strange thing is that it is shutting down the application.  A
>> policy might shut down failed nodes -- though I think by default these are
>> "quarantined", ie kept around for investigation rather than outright
>> deleted -- but the *application* should only be shut down if that is
>> manually initiated.  Can you grep the logs for "DwHO5Z9Y" to see what
>> triggered its shutdown?
>> 
>> Finally, another thing to try is giving it a bit more RAM, maybe 100 (mb)
>> is just too low, and that's why the cluster is failing. Try "512m".
>> 
>> Best
>> Alex
>> 
>> 
>> 
>> 
>> On 18/04/2015 11:31, jade mackay wrote:
>> 
>>> Hi,
>>> 
>>> I am trying to start a cassandra cluster on amazon ec2 using
>>>  cassandra-blueprint.yaml (slightly mdified) from
>>> https://github.com/brooklyncentral/blueprint-library.git:
>>> 
>>> 
>>> name: cassandra-cluster-app-defserv
>>> services:
>>> - type: brooklyn.entity.nosql.cassandra.CassandraCluster
>>>   name: Cassandra Cluster
>>>   brooklyn.config:
>>>     cluster.initial.size: 2
>>>     cluster.initial.quorumSize: 1
>>>     provisioning.properties:
>>>       minCores: 1
>>>       minRam: 100
>>> 
>>> Everything looks fine. I can ssh into the nodes and run nodetoosl status,
>>> which gives reasonable output:
>>> 
>>> Note: Ownership information does not include topology; for complete
>>> information, specify a keyspace
>>> Datacenter: datacenter1
>>> =======================
>>> Status=Up/Down
>>> |/ State=Normal/Leaving/Joining/Moving
>>> --  Address         Load       Owns   Host ID
>>> Token                                    Rack
>>> UN  10.232.138.106  10.77 KB   50.0%  22fda260-3cd5-4342-bcb3-b2d4b38facc5
>>>  -5997542197209433990                     rack1
>>> UN  10.254.20.58    14.04 KB   50.0%  1a1ea5b4-5285-4f88-8e31-c9a4538f6a62
>>>  3225829839645341818                      rack1
>>> 
>>> However, after a few minutes the instances shut are shut down
>>> 
>>> 2015-04-18 08:51:57,255 INFO  Launching CassandraNodeImpl{id=PRYt1W19}:
>>> cluster BrooklynCluster, hostname (public)
>>> ec2-52-12-82-75.us-west-2.compute.amazonaws.com, hostname (subnet)
>>> ec2-52-12-82-75.us-west-2.compute.amazonaws.com, seeds
>>> ec2-52-12-82-75.us-west-2.compute.amazonaws.com (from
>>> [CassandraNodeImpl{id=PRYt1W19}])
>>> 2015-04-18 08:51:59,626 INFO  Launching CassandraNodeImpl{id=zbsCBjGS}:
>>> delaying launch of non-first node by 59s 994ms to prevent schema
>>> disagreements
>>> 
>>> ...good.. and then:
>>> 
>>> 2015-04-18 08:57:11,547 WARN  Error invoking start at
>>> CassandraNodeImpl{id=PRYt1W19}: Timeout waiting for SERVICE_UP from
>>> CassandraNodeImpl{id=PRYt1W19}
>>> 2015-04-18 08:57:11,547 WARN  Cluster CassandraClusterImpl{id=Nz5UaPes}
>>> lost all its seeds while starting! Subsequent failure likely, but changing
>>> seeds during startup would risk split-brain:
>>> seeds=[CassandraNodeImpl{id=PRYt1W19}]
>>> 
>>> ... and now shut down cascade starts.
>>> 
>>> 2015-04-18 08:59:20,930 WARN
>>>  brooklyn.management.internal.LocalEntityManager@4dab43cb call to stop
>>> management of unknown entity (already unmanaged?)
>>> CassandraNodeImpl{id=PRYt1W19}; skipping, and all descendants
>>> 2015-04-18 08:59:20,934 INFO  Stopped application
>>> BasicApplicationImpl{id=DwHO5Z9Y}
>>> 
>>> 
>>> Any advice would be appreciated.
>>> 
>>> p.s. Is this the correct forum for this query?
>>> 
>>> Thanks,
>>> Jade
>>> 
>>> 
>> 
> 
> 
> -- 
> Jade Mackay
> e: jademackay@gmail.com
> m: +64-(0)22-319-0847


Mime
View raw message