Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of ftraian@gmail.com designates
 209.85.210.53 as permitted sender)
MIME-Version: 1.0
Sender: ftraian@gmail.com
In-Reply-To: 
 <CA+VSrLrT7-R6ZWG=PBW8J-bfLSVTc_U0TGTXce=7uQs9NjgtMQ@mail.gmail.com>
References: 
 <CAP5M+yFNia1_dj2ysTCBfHrTAjTvS1aqAwkwAMVz_m7TMBizMA@mail.gmail.com>
 <CA+VSrLqgV+-yZYmA2iBJzVuk+7fq2XbWXCRKB530G9NA17daYQ@mail.gmail.com>
 <CAP5M+yEi+a3KrBn4OCXTymUL9LsMs5qF9YJoqfbm6TxMhtw9Fw@mail.gmail.com>
 <CA+VSrLrT7-R6ZWG=PBW8J-bfLSVTc_U0TGTXce=7uQs9NjgtMQ@mail.gmail.com>
From: Traian Fratean <traian.fratean@gmail.com>
Date: Thu, 14 Feb 2013 11:57:15 +0200
Message-ID: 
 <CAP5M+yE=QzVib1YSamBGn=EeMaOgFwZDLmChuYaiBF8crEhhAw@mail.gmail.com>
Subject: Re: Cluster not accepting insert while one node is down
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=047d7b33cbdac9553404d5ac4924

--047d7b33cbdac9553404d5ac4924
Content-Type: text/plain; charset=ISO-8859-1

You're right as regarding data availability on that node. And my config,
being the default one, is not suited for a cluster.
What I don't get is that my 67 node was down and I was trying to insert in
66 node, as can be seen from the stacktrace. Long story short: when node 67
was down I could not insert into any machine in the cluster. Not what I was
expecting.

Thank you for the reply!
Traian.

2013/2/14 Alain RODRIGUEZ <arodrime@gmail.com>

> Hi Traian,
>
> There is your problem. You are using RF=1, meaning that each node is
> responsible for its range, and nothing more. So when a node goes down, do
> the math, you just can't read 1/5 of your data.
>
> This is very cool for performances since each node owns its own part of
> the data and any write or read need to reach only one node, but it removes
> the SPOF, which is a main point of using C*. So you have poor availability
> and poor consistency.
>
> An usual configuration with 5 nodes would be RF=3 and both CL (R&W) =
> QUORUM.
>
> This will replicate your data to 2 nodes + the natural endpoints (total of
> 3/5 nodes owning any data) and any read or write would need to reach at
> least 2 nodes before being considered as being successful ensuring a strong
> consistency.
>
> This configuration allow you to shut down a node (crash or configuration
> update/rolling restart) without degrading the service (at least allowing
> you to reach any data) but at cost of more data on each node.
>
> Alain
>
>
> 2013/2/14 Traian Fratean <traian.fratean@gmail.com>
>
>> I am using defaults for both RF and CL. As the keyspace was created using
>> cassandra-cli the default RF should be 1 as I get it from below:
>>
>> [default@TestSpace] describe;
>> Keyspace: TestSpace:
>>   Replication Strategy:
>> org.apache.cassandra.locator.NetworkTopologyStrategy
>>   Durable Writes: true
>>     Options: [datacenter1:1]
>>
>> As for the CL it the Astyanax default, which is 1 for both reads and
>> writes.
>>
>> Traian.
>>
>>
>> 2013/2/13 Alain RODRIGUEZ <arodrime@gmail.com>
>>
>>> We probably need more info like the RF of your cluster and CL of your
>>> reads and writes. Maybe could you also tell us if you use vnodes or not.
>>>
>>> I heard that Astyanax was not running very smoothly on 1.2.0, but a bit
>>> better on 1.2.1. Yet, Netflix didn't release a version of Astyanax for
>>> C*1.2.
>>>
>>> Alain
>>>
>>>
>>> 2013/2/13 Traian Fratean <traian.fratean@gmail.com>
>>>
>>>> Hi,
>>>>
>>>> I have a cluster of 5 nodes running Cassandra 1.2.0 . I have a Java
>>>> client with Astyanax 1.56.21.
>>>> When a node(10.60.15.67 - *diiferent* from the one in the stacktrace
>>>> below) went down I get TokenRandeOfflineException and no other data gets
>>>> inserted into *any other* node from the cluster.
>>>>
>>>> Am I having a configuration issue or this is supposed to happen?
>>>>
>>>>
>>>> com.netflix.astyanax.connectionpool.impl.CountingConnectionPoolMonitor.trackError(CountingConnectionPoolMonitor.java:81)
>>>> -
>>>> com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException:
>>>> TokenRangeOfflineException: [host=10.60.15.66(10.60.15.66):9160,
>>>> latency=2057(2057), attempts=1]UnavailableException()
>>>> com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException:
>>>> TokenRangeOfflineException: [host=10.60.15.66(10.60.15.66):9160,
>>>> latency=2057(2057), attempts=1]UnavailableException()
>>>> at
>>>> com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:165)
>>>>  at
>>>> com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:60)
>>>> at
>>>> com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:27)
>>>>  at
>>>> com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$1.execute(ThriftSyncConnectionFactoryImpl.java:140)
>>>> at
>>>> com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:69)
>>>>  at
>>>> com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:255)
>>>>
>>>>
>>>>
>>>> Thank you,
>>>> Traian.
>>>>
>>>
>>>
>>
>

--047d7b33cbdac9553404d5ac4924
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

You&#39;re right as regarding data availability on that node. And my config=
, being the default one, is not suited for a cluster.<div>What I don&#39;t =
get is that my 67 node was down and I was trying to insert in 66 node, as c=
an be seen from the stacktrace. Long story short: when node 67 was down I c=
ould not insert into any machine in the cluster. Not what I was expecting.<=
/div>

<div><br></div><div>Thank you for the reply!</div><div>Traian.</div><div><b=
r><div class=3D"gmail_quote">2013/2/14 Alain RODRIGUEZ <span dir=3D"ltr">&l=
t;<a href=3D"mailto:arodrime@gmail.com" target=3D"_blank">arodrime@gmail.co=
m</a>&gt;</span><br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">Hi Traian,<div><br></div><d=
iv>There is your problem. You are using RF=3D1, meaning that each node is r=
esponsible for its range, and nothing more. So when a node goes down, do th=
e math, you just can&#39;t read 1/5 of your data.</div>


<div><br></div><div>This is very cool for performances since each node owns=
 its own part of the data and any write or read need to reach only one node=
, but it removes the SPOF, which is a main point of using C*. So you have p=
oor availability and poor consistency.</div>


<div><br></div><div>An usual configuration with 5 nodes would be RF=3D3 and=
 both CL (R&amp;W) =3D QUORUM.</div><div><br></div><div>This will replicate=
 your data to 2 nodes + the natural endpoints (total of 3/5 nodes owning an=
y data) and any read or write would need to reach at least 2 nodes before b=
eing considered as being successful ensuring a strong consistency.</div>


<div><br></div><div>This configuration allow you to shut down a node (crash=
 or configuration update/rolling restart) without degrading the service (at=
 least allowing you to reach any data) but at cost of more data on each nod=
e.</div>

<span class=3D"HOEnZb"><font color=3D"#888888">

<div><br></div><div>Alain</div></font></span></div><div class=3D"HOEnZb"><d=
iv class=3D"h5"><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quot=
e">2013/2/14 Traian Fratean <span dir=3D"ltr">&lt;<a href=3D"mailto:traian.=
fratean@gmail.com" target=3D"_blank">traian.fratean@gmail.com</a>&gt;</span=
><br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">I am using defaults for both RF and CL. As t=
he keyspace was created using cassandra-cli the default RF should be 1 as I=
 get it from below:<div>


<br><div><font face=3D"courier new, monospace">[default@TestSpace] describe=
;</font></div>

<div><div><font face=3D"courier new, monospace">Keyspace: TestSpace:</font>=
</div><div><font face=3D"courier new, monospace">=A0 Replication Strategy: =
org.apache.cassandra.locator.NetworkTopologyStrategy</font></div><div><font=
 face=3D"courier new, monospace">=A0 Durable Writes: true</font></div>


<div><font face=3D"courier new, monospace">=A0 =A0 Options: [datacenter1:1]=
</font></div><div><br></div><div>As for the CL it the Astyanax default, whi=
ch is 1 for both reads and writes.</div><span><font color=3D"#888888"><div>

<br></div><div>Traian.</div></font></span><div><div><div>

<br></div><br><div class=3D"gmail_quote">2013/2/13 Alain RODRIGUEZ <span di=
r=3D"ltr">&lt;<a href=3D"mailto:arodrime@gmail.com" target=3D"_blank">arodr=
ime@gmail.com</a>&gt;</span><br><blockquote class=3D"gmail_quote" style=3D"=
margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div dir=3D"ltr">We probably need more info like the RF of your cluster and=
 CL of your reads and writes. Maybe could you also tell us if you use vnode=
s or not.<div><br></div><div>I heard that Astyanax was not running very smo=
othly on 1.2.0, but a bit better on 1.2.1. Yet, Netflix didn&#39;t release =
a version of Astyanax for C*1.2.</div>


<span><font color=3D"#888888">

<div><br></div><div>Alain</div></font></span></div><div><div><div class=3D"=
gmail_extra"><br><br><div class=3D"gmail_quote">2013/2/13 Traian Fratean <s=
pan dir=3D"ltr">&lt;<a href=3D"mailto:traian.fratean@gmail.com" target=3D"_=
blank">traian.fratean@gmail.com</a>&gt;</span><br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Hi,<div><br></div><div>I have a cluster of 5=
 nodes running Cassandra 1.2.0 . I have a Java client with Astyanax=A01.56.=
21.</div>


<div>When a node(10.60.15.67 - <b>diiferent</b> from the one in the stacktr=
ace below) went down I get TokenRandeOfflineException and no other data get=
s inserted into <b>any other</b> node from the cluster.</div>


<div><br></div><div>Am I having a configuration issue or this is supposed t=
o happen?</div><div><br></div><div><br></div><div><div>com.netflix.astyanax=
.connectionpool.impl.CountingConnectionPoolMonitor.trackError(CountingConne=
ctionPoolMonitor.java:81) - com.netflix.astyanax.connectionpool.exceptions.=
TokenRangeOfflineException: TokenRangeOfflineException: [host=3D10.60.15.66=
(10.60.15.66):9160, latency=3D2057(2057), attempts=3D1]UnavailableException=
()</div>


<div>com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineExcept=
ion: TokenRangeOfflineException: [host=3D10.60.15.66(10.60.15.66):9160, lat=
ency=3D2057(2057), attempts=3D1]UnavailableException()</div><div><span styl=
e=3D"white-space:pre-wrap">	</span>at com.netflix.astyanax.thrift.ThriftCon=
verter.ToConnectionPoolException(ThriftConverter.java:165)</div>


<div><span style=3D"white-space:pre-wrap">	</span>at com.netflix.astyanax.t=
hrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:60)</div><di=
v><span style=3D"white-space:pre-wrap">	</span>at com.netflix.astyanax.thri=
ft.AbstractOperationImpl.execute(AbstractOperationImpl.java:27)</div>


<div><span style=3D"white-space:pre-wrap">	</span>at com.netflix.astyanax.t=
hrift.ThriftSyncConnectionFactoryImpl$1.execute(ThriftSyncConnectionFactory=
Impl.java:140)</div><div><span style=3D"white-space:pre-wrap">	</span>at co=
m.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryO=
peration(AbstractExecuteWithFailoverImpl.java:69)</div>


<div><span style=3D"white-space:pre-wrap">	</span>at com.netflix.astyanax.c=
onnectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(=
AbstractHostPartitionConnectionPool.java:255)</div></div>
<div><br></div><div><br></div><div><br></div><div>Thank you,</div><div>Trai=
an.</div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div></div></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--047d7b33cbdac9553404d5ac4924--