Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4818076DC for ; Fri, 28 Oct 2011 08:00:24 +0000 (UTC) Received: (qmail 17046 invoked by uid 500); 28 Oct 2011 08:00:21 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 16974 invoked by uid 500); 28 Oct 2011 08:00:20 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 16966 invoked by uid 99); 28 Oct 2011 08:00:20 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Oct 2011 08:00:20 +0000 X-ASF-Spam-Status: No, hits=2.8 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of sicoe.alexandru@googlemail.com designates 209.85.216.179 as permitted sender) Received: from [209.85.216.179] (HELO mail-qy0-f179.google.com) (209.85.216.179) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Oct 2011 08:00:14 +0000 Received: by qyk31 with SMTP id 31so4013010qyk.10 for ; Fri, 28 Oct 2011 00:59:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=jv9Y+NOURsAizRzqM/7KQlAL4nMgq07Ooet4Kd3Xzoo=; b=g2N044RD513EQpbfDYWPMqfljnMLt4Mi4D8jJSHGf0fjuwAhbE6oQc81OjZXEsCbBo 5qCAnmH+SPFgNvcCnvp6PcymfHBI4jV+RRGMqssub9cheGzvW45CF16J+InL99CgojS8 Al+Y45ipEBLGafD1Z9C0RdBK12kCnI2XyWFko= MIME-Version: 1.0 Received: by 10.229.42.208 with SMTP id t16mr379037qce.135.1319788793362; Fri, 28 Oct 2011 00:59:53 -0700 (PDT) Received: by 10.229.120.81 with HTTP; Fri, 28 Oct 2011 00:59:53 -0700 (PDT) In-Reply-To: References: <1319728960375-6936767.post@n2.nabble.com> Date: Fri, 28 Oct 2011 09:59:53 +0200 Message-ID: Subject: Re: UnavailableException with 1 node down and RF=2? From: Alexandru Dan Sicoe To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=00163642687334f80304b0574607 X-Virus-Checked: Checked by ClamAV on apache.org --00163642687334f80304b0574607 Content-Type: text/plain; charset=ISO-8859-1 Hi guys, It's interesting to see this thread. I recently discovered a similar problem on my 3 node Cassandra 0.8.5 cluster. It was working fine, then I took a node down to see how it behaves. All of a sudden I couldn't write or read because of this exception being thrown: Exception in thread "main" me.prettyprint.hector.api.exceptions.HUnavailableException: : May not be enough replicas present to handle consistency level. at me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:60) at me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:97) at me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:90) at me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101) at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:232) at me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:131) at me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:102) at me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:108) at me.prettyprint.cassandra.model.MutatorImpl$3.doInKeyspace(MutatorImpl.java:222) at me.prettyprint.cassandra.model.MutatorImpl$3.doInKeyspace(MutatorImpl.java:219) at me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20) at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:85) at me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:219) at ch.cern.pbeast.CassandraDBClient.executeBatchInsert(CassandraDBClient.java:958) at ch.cern.test.TimeBinTester.main(TimeBinTester.java:294)Caused by: UnavailableException() at org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:19053) at org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:1035) at org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:1009) at me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:95) ... 13 more By the way, I'm using Hector 0.8.0.-2 which has the following defaults: Default replication factor = 1 Default replication strategy = SimpleStrategy Default consistency level policy = HconsistencyLevelPolicy.QUORUM Default failover policy = FailoverPolicy.ON_FAIL_TRY_ALL_AVAILABLE When I first created the Schema for my cluster I used these defaults. Then I replaced the ConsistencyLevel to ONE for reads and ANY for WRITES and I thought everything would work if a node goes down but apparently not. One more thing, I'm using DataStax OpsCenter to monitor and manage my cluster. Apart from the System and OpsCenter keyspaces which aren't created by me I have another 2 keyspaces. In total my cluster has 116 CFs. If I click to view replication of any node I get 2 for the OpsCenter keyspace and 1 for the other two keyspaces I create, so everything seems fine. To mention that during a node being down I could read from the OpsCenter keyspace without a problem....I couldn't read or write to my own keyspaces. Any idea where to look to investigate this further? Cheers, Alex On Thu, Oct 27, 2011 at 10:27 PM, R. Verlangen wrote: > Thats correct. It was a read consistency problem, not so smart of me ;-) > > Thank you anyway. > > > 2011/10/27 Jonathan Ellis > >> (I see that you did start a new thread and solved it with Jake's help.) >> >> On Thu, Oct 27, 2011 at 11:23 AM, Jonathan Ellis >> wrote: >> > Ha. On the one hand, good on you for searching the list archives for >> > similar problems. On the other hand, after over a year it's probably >> > worth starting a new thread. :) >> > >> > Standard questions: >> > >> > - What Cassandra version are you running? >> > - Are there exceptions in the log for the machine still running? >> > - What does "not responding anymore" mean? Reporting timeouts, >> > reporting unavailable, refusing client connections, ... ? >> > >> > On Thu, Oct 27, 2011 at 10:22 AM, RobinUs2 wrote: >> >> I'm currently having a similar problem with a 2-node cluster. When 1 >> shutdown >> >> one of the nodes, the other isn't responding any more. >> >> >> >> Did you found a solution for your problem? >> >> >> >> /I'm new to mailing lists, if it's inappropriate to reply here, please >> let >> >> me know../ >> >> >> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-td6936722.html >> >> >> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-td6936722.html >> >> >> >> -- >> >> View this message in context: >> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/UnavailableException-with-1-node-down-and-RF-2-tp5242055p6936767.html >> >> Sent from the cassandra-user@incubator.apache.org mailing list archive >> at Nabble.com. >> >> >> > >> > >> > >> > -- >> > Jonathan Ellis >> > Project Chair, Apache Cassandra >> > co-founder of DataStax, the source for professional Cassandra support >> > http://www.datastax.com >> > >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com >> > > --00163642687334f80304b0574607 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi guys,
=A0It's interesting to see this thread. I recently discover= ed a similar problem on my 3 node Cassandra 0.8.5 cluster. It was working f= ine, then I took a node down to see how it behaves. All of a sudden I could= n't write or read because of this exception being thrown:
=09 =09 =09
Exception in thread "main" me.prettyprint.h=
ector.api.exceptions.HUnavailableException: : May not be enough replicas pr=
esent to handle consistency level.

        at me.prettyprint.cassan=
dra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.jav=
a:60)

        at me.prettyprint.cassan=
dra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:97)

        at me.prettyprint.cassan=
dra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:90)

        at me.prettyprint.cassan=
dra.service.Operation.executeAndSetResult(Operation.java:101)

        at me.prettyprint.cassan=
dra.connection.HConnectionManager.operateWithFailover(HConnectionManager.ja=
va:232)

        at me.prettyprint.cassan=
dra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.jav=
a:131)

        at me.prettyprint.cassan=
dra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:102)

        at me.prettyprint.cassan=
dra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:108)

        at me.prettyprint.cassan=
dra.model.MutatorImpl$3.doInKeyspace(MutatorImpl.java:222)

        at me.prettyprint.cassan=
dra.model.MutatorImpl$3.doInKeyspace(MutatorImpl.java:219)

        at me.prettyprint.cassan=
dra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperatio=
nCallback.java:20)

        at me.prettyprint.cassan=
dra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:85)

        at me.prettyprint.cassan=
dra.model.MutatorImpl.execute(MutatorImpl.java:219)

        at ch.cern.pbeast.Cassan=
draDBClient.executeBatchInsert(CassandraDBClient.java:958)

        at ch.cern.test.TimeBinT=
ester.main(TimeBinTester.java:294)

Caused by: UnavailableException(=
)

        at org.apache.cassandra.=
thrift.Cassandra$batch_mutate_result.read(Cassandra.java:19053)

        at org.apache.cassandra.=
thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:1035)

        at org.apache.cassandra.=
thrift.Cassandra$Client.batch_mutate(Cassandra.java:1009)

        at me.prettyprint.cassan=
dra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:95)

        ... 13 more
By the way, I'm using Hector 0.8.0.-2 which has the following defaults:=
=A0=A0=A0 Default replication factor =3D 1
=A0=A0=A0 Default replica= tion strategy =3D SimpleStrategy
=A0=A0=A0 Default consistency level pol= icy =3D HconsistencyLevelPolicy.QUORUM
=A0=A0=A0 Default failover policy =3D FailoverPolicy.ON_FAIL_TRY_ALL_AVAILA= BLE

When I first created the Schema for my cluster I used these defa= ults. Then I replaced the ConsistencyLevel to ONE for reads and ANY for WRI= TES and I thought everything would work if a node goes down but apparently = not.

One more thing, I'm using DataStax OpsCenter to monitor and manage = my cluster. Apart from the System and OpsCenter keyspaces which aren't = created by me I have another 2 keyspaces. In total my cluster has 116 CFs. = If I click to view replication of any node I get 2 for the OpsCenter keyspa= ce and 1 for the other two keyspaces I create, so everything seems fine. To= mention that during a node being down I could read from the OpsCenter keys= pace without a problem....I couldn't read or write to my own keyspaces.=

Any idea where to look to investigate this further?

Cheers,
A= lex

On Thu, Oct 27, 2011 at 10:27 PM, R. = Verlangen <robin@us2.n= l> wrote:
Thats correct. It= was a read consistency problem, not so smart of me ;-)

Thank you anyway.


2011/10/27 Jonathan Ellis <jbellis@gmail.com>
(I see that you d= id start a new thread and solved it with Jake's help.)

On Thu, Oct 27, 2011 at 11:23 AM, Jonathan Ellis <jbellis@gmail.com> wrote:
> Ha. =A0On the one hand, good on you for searching the list archives fo= r
> similar problems. =A0On the other hand, after over a year it's pro= bably
> worth starting a new thread. :)
>
> Standard questions:
>
> - What Cassandra version are you running?
> - Are there exceptions in the log for the machine still running?
> - What does "not responding anymore" mean? =A0Reporting time= outs,
> reporting unavailable, refusing client connections, ... ?
>
> On Thu, Oct 27, 2011 at 10:22 AM, RobinUs2 <robin@us2.nl> wrote:
>> I'm currently having a similar problem with a 2-node cluster. = When 1 shutdown
>> one of the nodes, the other isn't responding any more.
>>
>> Did you found a solution for your problem?
>>
>> /I'm new to mailing lists, if it's inappropriate to reply = here, please let
>> me know../
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.co= m/2-node-cluster-1-node-down-overall-failure-td6936722.html
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.co= m/2-node-cluster-1-node-down-overall-failure-td6936722.html
>>
>> --
>> View this message in context: http://cassandra-user= -incubator-apache-org.3065146.n2.nabble.com/UnavailableException-with-1-nod= e-down-and-RF-2-tp5242055p6936767.html
>> Sent from the cassandra-user@incubator.apache.org mailing list a= rchive at Nabble.com.
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support<= br> > http://www.datas= tax.com
>



--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.c= om

--00163642687334f80304b0574607--