brooklyn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jade mackay <jademac...@gmail.com>
Subject Re: Unsuccessful cassandra deployment using yaml from blueprint-libary
Date Sun, 19 Apr 2015 02:42:11 GMT
Hi Alex,

The cluster was shutting down because it catches fire and I was launching
it from the command line rather than the web console.
When launched from the web console the cluster persists and data propagates
over the nodes, despite being on fire with all nodes quarantined.
Incidentally, I can use the cluster effector expand the cluster but not
shrink it.

The top level "cassandra-cluster-app" summary:

Required entity not healthy: CassandraClusterImpl{id=DUHV0IoT}
*Failure running task invoking start[locations] on 1 node (FPi4GFmk)
<http://localhost:8081/#/v1/applications/BiAPu1MO/entities/BiAPu1MO/activities/subtask/FPi4GFmk>:
*Error
invoking start at CassandraClusterImpl{id=DUHV0IoT}: Node in cluster
CassandraClusterImpl{id=DUHV0IoT} failed: 2 of 2 parallel child tasks
failed, 2 errors including: Error invoking start at
CassandraNodeImpl{id=Ewkzxyc8}: Timeout waiting for SERVICE_UP from
CassandraNodeImpl{id=Ewkzxyc8}

and the "Cassandra Cluster" summary:

start failed with error: java.lang.IllegalStateException: Node in cluster
CassandraClusterImpl{id=DUHV0IoT} failed: 2 of 2 parallel child tasks
failed, 2 errors including: Error invoking start at
CassandraNodeImpl{id=Ewkzxyc8}: Timeout waiting for SERVICE_UP from
CassandraNodeImpl{id=Ewkzxyc8}
*Failure running task starting 2 nodes (parallel) (jmlBKmM8)
<http://localhost:8081/#/v1/applications/BiAPu1MO/entities/DUHV0IoT/activities/subtask/jmlBKmM8>:
*2
of 2 parallel child tasks failed, 2 errors including: Error invoking start
at CassandraNodeImpl{id=Ewkzxyc8}: Timeout waiting for SERVICE_UP from
CassandraNodeImpl{id=Ewkzxyc8}

And one of the (aptly named) burning nodes "CassandraNode:D0wG":

The software process for this entity does not appear to be running
*Failure running task post-start (qBmoHFo2)
<http://localhost:8081/#/v1/applications/BiAPu1MO/entities/D0wGMtHJ/activities/subtask/qBmoHFo2>:
*

For reference the blueprint:
name: cassandra-cluster-app
services:
- type: brooklyn.entity.nosql.cassandra.CassandraCluster
  name: Cassandra Cluster
  brooklyn.config:
    cluster.initial.size: 2
    cluster.initial.quorumSize: 1
    provisioning.properties:
      minCores: 1
      minRam: 512
location: aws-oregon

Again, any tips on hunting the issue down would be appreciated.

Cheers,
Jade






On 18 April 2015 at 23:30, Alex Heneveld <alex.heneveld@cloudsoftcorp.com>
wrote:

>
> Hi Jade,
>
> Yes, this is the right place for your question.  Getting the Cassandra
> start-up sequence took some work, especially in different clouds with
> different notions of public and private networks, but this was hammered out
> a while ago and it has been pretty reliable since then, including in AWS, I
> thought.  Some questions and idea...
>
> Does a single CassandraNode work?
>
> The other strange thing is that it is shutting down the application.  A
> policy might shut down failed nodes -- though I think by default these are
> "quarantined", ie kept around for investigation rather than outright
> deleted -- but the *application* should only be shut down if that is
> manually initiated.  Can you grep the logs for "DwHO5Z9Y" to see what
> triggered its shutdown?
>
> Finally, another thing to try is giving it a bit more RAM, maybe 100 (mb)
> is just too low, and that's why the cluster is failing. Try "512m".
>
> Best
> Alex
>
>
>
>
> On 18/04/2015 11:31, jade mackay wrote:
>
>> Hi,
>>
>> I am trying to start a cassandra cluster on amazon ec2 using
>>   cassandra-blueprint.yaml (slightly mdified) from
>> https://github.com/brooklyncentral/blueprint-library.git:
>>
>>
>> name: cassandra-cluster-app-defserv
>> services:
>> - type: brooklyn.entity.nosql.cassandra.CassandraCluster
>>    name: Cassandra Cluster
>>    brooklyn.config:
>>      cluster.initial.size: 2
>>      cluster.initial.quorumSize: 1
>>      provisioning.properties:
>>        minCores: 1
>>        minRam: 100
>>
>> Everything looks fine. I can ssh into the nodes and run nodetoosl status,
>> which gives reasonable output:
>>
>> Note: Ownership information does not include topology; for complete
>> information, specify a keyspace
>> Datacenter: datacenter1
>> =======================
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address         Load       Owns   Host ID
>> Token                                    Rack
>> UN  10.232.138.106  10.77 KB   50.0%  22fda260-3cd5-4342-bcb3-b2d4b38facc5
>>   -5997542197209433990                     rack1
>> UN  10.254.20.58    14.04 KB   50.0%  1a1ea5b4-5285-4f88-8e31-c9a4538f6a62
>>   3225829839645341818                      rack1
>>
>> However, after a few minutes the instances shut are shut down
>>
>> 2015-04-18 08:51:57,255 INFO  Launching CassandraNodeImpl{id=PRYt1W19}:
>> cluster BrooklynCluster, hostname (public)
>> ec2-52-12-82-75.us-west-2.compute.amazonaws.com, hostname (subnet)
>> ec2-52-12-82-75.us-west-2.compute.amazonaws.com, seeds
>> ec2-52-12-82-75.us-west-2.compute.amazonaws.com (from
>> [CassandraNodeImpl{id=PRYt1W19}])
>> 2015-04-18 08:51:59,626 INFO  Launching CassandraNodeImpl{id=zbsCBjGS}:
>> delaying launch of non-first node by 59s 994ms to prevent schema
>> disagreements
>>
>> ...good.. and then:
>>
>> 2015-04-18 08:57:11,547 WARN  Error invoking start at
>> CassandraNodeImpl{id=PRYt1W19}: Timeout waiting for SERVICE_UP from
>> CassandraNodeImpl{id=PRYt1W19}
>> 2015-04-18 08:57:11,547 WARN  Cluster CassandraClusterImpl{id=Nz5UaPes}
>> lost all its seeds while starting! Subsequent failure likely, but changing
>> seeds during startup would risk split-brain:
>> seeds=[CassandraNodeImpl{id=PRYt1W19}]
>>
>> ... and now shut down cascade starts.
>>
>> 2015-04-18 08:59:20,930 WARN
>>   brooklyn.management.internal.LocalEntityManager@4dab43cb call to stop
>> management of unknown entity (already unmanaged?)
>> CassandraNodeImpl{id=PRYt1W19}; skipping, and all descendants
>> 2015-04-18 08:59:20,934 INFO  Stopped application
>> BasicApplicationImpl{id=DwHO5Z9Y}
>>
>>
>> Any advice would be appreciated.
>>
>> p.s. Is this the correct forum for this query?
>>
>> Thanks,
>> Jade
>>
>>
>


-- 
Jade Mackay
e: jademackay@gmail.com
m: +64-(0)22-319-0847

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message