Hi guys,
 It's interesting to see this thread. I recently discovered a similar problem on my 3 node Cassandra 0.8.5 cluster. It was working fine, then I took a node down to see how it behaves. All of a sudden I couldn't write or read because of this exception being thrown:
Exception in thread "main" me.prettyprint.hector.api.exceptions.HUnavailableException: : May not be enough replicas present to handle consistency level.

        at me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:60)

        at me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:97)

        at me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:90)

        at me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101)

        at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:232)

        at me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:131)

        at me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:102)

        at me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:108)

        at me.prettyprint.cassandra.model.MutatorImpl$3.doInKeyspace(MutatorImpl.java:222)

        at me.prettyprint.cassandra.model.MutatorImpl$3.doInKeyspace(MutatorImpl.java:219)

        at me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20)

        at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:85)

        at me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:219)

        at ch.cern.pbeast.CassandraDBClient.executeBatchInsert(CassandraDBClient.java:958)

        at ch.cern.test.TimeBinTester.main(TimeBinTester.java:294)

Caused by: UnavailableException()

        at org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:19053)

        at org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:1035)

        at org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:1009)

        at me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:95)

        ... 13 more
By the way, I'm using Hector 0.8.0.-2 which has the following defaults:
    Default replication factor = 1
    Default replication strategy = SimpleStrategy
    Default consistency level policy = HconsistencyLevelPolicy.QUORUM
    Default failover policy = FailoverPolicy.ON_FAIL_TRY_ALL_AVAILABLE

When I first created the Schema for my cluster I used these defaults. Then I replaced the ConsistencyLevel to ONE for reads and ANY for WRITES and I thought everything would work if a node goes down but apparently not.

One more thing, I'm using DataStax OpsCenter to monitor and manage my cluster. Apart from the System and OpsCenter keyspaces which aren't created by me I have another 2 keyspaces. In total my cluster has 116 CFs. If I click to view replication of any node I get 2 for the OpsCenter keyspace and 1 for the other two keyspaces I create, so everything seems fine. To mention that during a node being down I could read from the OpsCenter keyspace without a problem....I couldn't read or write to my own keyspaces.

Any idea where to look to investigate this further?


On Thu, Oct 27, 2011 at 10:27 PM, R. Verlangen <robin@us2.nl> wrote:
Thats correct. It was a read consistency problem, not so smart of me ;-)

Thank you anyway.

2011/10/27 Jonathan Ellis <jbellis@gmail.com>
(I see that you did start a new thread and solved it with Jake's help.)

On Thu, Oct 27, 2011 at 11:23 AM, Jonathan Ellis <jbellis@gmail.com> wrote:
> Ha.  On the one hand, good on you for searching the list archives for
> similar problems.  On the other hand, after over a year it's probably
> worth starting a new thread. :)
> Standard questions:
> - What Cassandra version are you running?
> - Are there exceptions in the log for the machine still running?
> - What does "not responding anymore" mean?  Reporting timeouts,
> reporting unavailable, refusing client connections, ... ?
> On Thu, Oct 27, 2011 at 10:22 AM, RobinUs2 <robin@us2.nl> wrote:
>> I'm currently having a similar problem with a 2-node cluster. When 1 shutdown
>> one of the nodes, the other isn't responding any more.
>> Did you found a solution for your problem?
>> /I'm new to mailing lists, if it's inappropriate to reply here, please let
>> me know../
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-td6936722.html
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-td6936722.html
>> --
>> View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/UnavailableException-with-1-node-down-and-RF-2-tp5242055p6936767.html
>> Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com

Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support