cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shalom Sagges <shal...@liveperson.com>
Subject Re: A Single Dropped Node Fails Entire Read Queries
Date Fri, 10 Mar 2017 09:48:59 GMT
@Ryan, my keyspace replication settings are as follows:
CREATE KEYSPACE mykeyspace WITH replication = {'class':
'NetworkTopologyStrategy', 'DC1': '3', 'DC2: '3', 'DC3': '3'}  AND
durable_writes = true;

CREATE TABLE mykeyspace.test (
    column1 text,
    column2 text,
    column3 text,
    PRIMARY KEY (column1, column2)

The query is *select * from mykeyspace.test where column1='xxxxx';*

@Daniel, the replication factor is 3. That's why I don't understand why I
get these timeouts when only one node drops.

Also, when I enabled tracing, I got the following error:
*Unable to fetch query trace: ('Unable to complete the operation against
any hosts', {<Host: 127.0.0.1 DC1>: Unavailable('Error from server:
code=1000 [Unavailable exception] message="Cannot achieve consistency level
LOCAL_QUORUM" info={\'required_replicas\': 2, \'alive_replicas\': 1,
\'consistency\': \'LOCAL_QUORUM\'}',)})*

But nodetool status shows that only 1 replica was down:
--  Address          Load       Tokens       Owns    Host ID
                Rack
DN  x.x.x.235  134.32 MB  256          ?
c0920d11-08da-4f18-a7f3-dbfb8c155b19  RAC1
UN  x.x.x.236  134.02 MB  256          ?
2cc0a27b-b1e4-461f-a3d2-186d3d82ff3d  RAC1
UN  x.x.x.237  134.34 MB  256          ?
5b2162aa-8803-4b54-88a9-ff2e70b3d830  RAC1


I tried to run the same scenario on all 3 nodes, and only the 3rd node
didn't fail the query when I dropped it. The nodes were installed and
configured with Puppet so the configuration is the same on all 3 nodes.


Thanks!



On Fri, Mar 10, 2017 at 10:25 AM, Daniel Hölbling-Inzko <
daniel.hoelbling-inzko@bitmovin.com> wrote:

> The LOCAL_QUORUM works on the available replicas in the dc. So if your
> replication factor is 2 and you have 10 nodes you can still only loose 1.
> With a replication factor of 3 you can loose one node and still satisfy the
> query.
> Ryan Svihla <rs@foundev.pro> schrieb am Do. 9. März 2017 um 18:09:
>
>> whats your keyspace replication settings and what's your query?
>>
>> On Thu, Mar 9, 2017 at 9:32 AM, Shalom Sagges <shaloms@liveperson.com>
>> wrote:
>>
>> Hi Cassandra Users,
>>
>> I hope someone could help me understand the following scenario:
>>
>> Version: 3.0.9
>> 3 nodes per DC
>> 3 DCs in the cluster.
>> Consistency Local_Quorum.
>>
>> I did a small resiliency test and dropped a node to check the
>> availability of the data.
>> What I assumed would happen is nothing at all. If a node is down in a 3
>> nodes DC, Local_Quorum should still be satisfied.
>> However, during the ~10 first seconds after stopping the service, I got
>> timeout errors (tried it both from the client and from cqlsh.
>>
>> This is the error I get:
>> *ServerError:
>> com.google.common.util.concurrent.UncheckedExecutionException:
>> com.google.common.util.concurrent.UncheckedExecutionException:
>> java.lang.RuntimeException:
>> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out -
>> received only 4 responses.*
>>
>>
>> After ~10 seconds, the same query is successful with no timeout errors.
>> The dropped node is still down.
>>
>> Any idea what could cause this and how to fix it?
>>
>> Thanks!
>>
>>
>> This message may contain confidential and/or privileged information.
>> If you are not the addressee or authorized to receive this on behalf of
>> the addressee you must not use, copy, disclose or take action based on this
>> message or any information herein.
>> If you have received this message in error, please advise the sender
>> immediately by reply email and delete this message. Thank you.
>>
>>
>>
>>
>> --
>>
>> Thanks,
>> Ryan Svihla
>>
>>

-- 
This message may contain confidential and/or privileged information. 
If you are not the addressee or authorized to receive this on behalf of the 
addressee you must not use, copy, disclose or take action based on this 
message or any information herein. 
If you have received this message in error, please advise the sender 
immediately by reply email and delete this message. Thank you.

Mime
View raw message