Robert,

Many thanks.  Yes, it looks like a bug in 1.2.2.  So far (6 runs) v 1.2.9 is acting as I had expected.

(BTW re schema, I'm not defining anything myself so it is just the default/empty schema for which I was getting disagreeing versions.)

Can I confirm I'm following best practice:  when starting my cluster, I pick 2 nodes as seed nodes, then set up and start all the nodes in parallel, all configured to use those 2 seed nodes.

Is there any care recommended in the start-up order?

The docs suggest it doesn't matter too much so long as there is enough information to discover all the nodes, but I've seen some surprising behaviour.  Besides the buggy 1.2.2, even with 1.2.9 if you try to be smart and set seeds@node1=node2 and seeds@node2=node1 it blows up rather spectacularly -- [1] !

Cheers
Alex

[1]  java.lang.RuntimeException: No other nodes seen!  Unable to bootstrap.If you intended to start a single-node cluster, you should make sure your broadcast_address (or listen_address) is listed as a seed.  Otherwise, you need to determine why the seed being contacted has no knowledge of the rest of the cluster.  Usually, this can be solved by giving all nodes the same seed list.


On 09/09/2013 17:51, Robert Coli wrote:
On Mon, Sep 9, 2013 at 8:07 AM, Alex Heneveld <alex.heneveld@cloudsoftcorp.com> wrote:
The problem occurs in about 1 in 4 launches when I start a 2-node cluster, where the two machines are configured identically with both nodes as the seeds (apart from the listen_address being different). On the problematic launches, describing schema versions immediately after start shows that the two nodes have different schemas (reported at both nodes) and any attempt to work with the nodes returns the SDE.  This is before I attempt to do anything to the cluster.  After ~60s the nodes reconcile their differences, report a single schema used at both nodes, and I can use the cluster without problems.

How are you defining the schema?
 
I have a workaround, which is to use just one node to seed this initial set.  When the set of seeds is cardinality 1, the problem does not occur.  However the advice is to use 2 seeds and have them be the same across the cluster -- so I'd like to get to the bottom of this!

Seed nodes "cannot" bootstrap [1], so if you have RF=N and all nodes as seeds, I'm not surprised that you are experiencing weird behavior. A node booting with itself as a seed typically just starts up as a cluster of one.
 
I am running Cassandra 1.2.2 running in Amazon, using Brooklyn (brooklyn.io) to start and manage it.  I can share test cases, cassandra.yaml, logs, etc -- but am starting with the above summary in case anyone can point me in the right direction from that.

Cassandra 1.2.2 has significant bugs. You should launch with 1.2.9.

1.2.9 also has some bootstrap vs. seed fixes which might help your case.

=Rob
[1] https://issues.apache.org/jira/browse/CASSANDRA-5836