incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Traian Fratean <traian.frat...@gmail.com>
Subject Re: Cluster not accepting insert while one node is down
Date Thu, 14 Feb 2013 09:57:15 GMT
You're right as regarding data availability on that node. And my config,
being the default one, is not suited for a cluster.
What I don't get is that my 67 node was down and I was trying to insert in
66 node, as can be seen from the stacktrace. Long story short: when node 67
was down I could not insert into any machine in the cluster. Not what I was
expecting.

Thank you for the reply!
Traian.

2013/2/14 Alain RODRIGUEZ <arodrime@gmail.com>

> Hi Traian,
>
> There is your problem. You are using RF=1, meaning that each node is
> responsible for its range, and nothing more. So when a node goes down, do
> the math, you just can't read 1/5 of your data.
>
> This is very cool for performances since each node owns its own part of
> the data and any write or read need to reach only one node, but it removes
> the SPOF, which is a main point of using C*. So you have poor availability
> and poor consistency.
>
> An usual configuration with 5 nodes would be RF=3 and both CL (R&W) =
> QUORUM.
>
> This will replicate your data to 2 nodes + the natural endpoints (total of
> 3/5 nodes owning any data) and any read or write would need to reach at
> least 2 nodes before being considered as being successful ensuring a strong
> consistency.
>
> This configuration allow you to shut down a node (crash or configuration
> update/rolling restart) without degrading the service (at least allowing
> you to reach any data) but at cost of more data on each node.
>
> Alain
>
>
> 2013/2/14 Traian Fratean <traian.fratean@gmail.com>
>
>> I am using defaults for both RF and CL. As the keyspace was created using
>> cassandra-cli the default RF should be 1 as I get it from below:
>>
>> [default@TestSpace] describe;
>> Keyspace: TestSpace:
>>   Replication Strategy:
>> org.apache.cassandra.locator.NetworkTopologyStrategy
>>   Durable Writes: true
>>     Options: [datacenter1:1]
>>
>> As for the CL it the Astyanax default, which is 1 for both reads and
>> writes.
>>
>> Traian.
>>
>>
>> 2013/2/13 Alain RODRIGUEZ <arodrime@gmail.com>
>>
>>> We probably need more info like the RF of your cluster and CL of your
>>> reads and writes. Maybe could you also tell us if you use vnodes or not.
>>>
>>> I heard that Astyanax was not running very smoothly on 1.2.0, but a bit
>>> better on 1.2.1. Yet, Netflix didn't release a version of Astyanax for
>>> C*1.2.
>>>
>>> Alain
>>>
>>>
>>> 2013/2/13 Traian Fratean <traian.fratean@gmail.com>
>>>
>>>> Hi,
>>>>
>>>> I have a cluster of 5 nodes running Cassandra 1.2.0 . I have a Java
>>>> client with Astyanax 1.56.21.
>>>> When a node(10.60.15.67 - *diiferent* from the one in the stacktrace
>>>> below) went down I get TokenRandeOfflineException and no other data gets
>>>> inserted into *any other* node from the cluster.
>>>>
>>>> Am I having a configuration issue or this is supposed to happen?
>>>>
>>>>
>>>> com.netflix.astyanax.connectionpool.impl.CountingConnectionPoolMonitor.trackError(CountingConnectionPoolMonitor.java:81)
>>>> -
>>>> com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException:
>>>> TokenRangeOfflineException: [host=10.60.15.66(10.60.15.66):9160,
>>>> latency=2057(2057), attempts=1]UnavailableException()
>>>> com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException:
>>>> TokenRangeOfflineException: [host=10.60.15.66(10.60.15.66):9160,
>>>> latency=2057(2057), attempts=1]UnavailableException()
>>>> at
>>>> com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:165)
>>>>  at
>>>> com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:60)
>>>> at
>>>> com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:27)
>>>>  at
>>>> com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$1.execute(ThriftSyncConnectionFactoryImpl.java:140)
>>>> at
>>>> com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:69)
>>>>  at
>>>> com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:255)
>>>>
>>>>
>>>>
>>>> Thank you,
>>>> Traian.
>>>>
>>>
>>>
>>
>

Mime
View raw message