incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Read-repair working, repair not working?
Date Mon, 11 Feb 2013 10:00:12 GMT
> CL.ONE : this is primarily for performance reasons …
This makes reasoning about "correct" behaviour a little harder. 

If there is anyway you can run some tests with R + W > N strong consistency I would encourage
you to do so. You will then have a baseline of what works.

>  (say I make 100 requests : all 100 initially fail and subsequently all 100 succeed),
so not sure it'll help?
The high number of inconsistencies seems to match with the massive number of dropped Mutation
messages. Even if Anti Entropy is running, if the node in HK is dropping so many messages
there will be inconsistencies. 
  

It looks like the HK node is overloaded. I would check the logs for GC messages, check for
VM steal in a virtualised env, check for sufficient CPU + memory resources, check for IO stress.


>>> 20 node cluster running v1.0.7 split between 5 data centres,
>>> I’ve had some intermittent issues with a new data centre (3 nodes, RF=2) I
Do all DC's have the same number of nodes ?

Cheers
-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 11/02/2013, at 9:13 PM, Brian Fleming <bigbrianfleming@gmail.com> wrote:

> 
> Hi Aaron,
>  
> Many thanks for your reply - answers below.
>  
> Cheers,
>  
> Brian
>  
>  
> > What CL are you using for reads and writes?
> > I would first build a test case to ensure correct operation when using strong consistency.
i.e. QUOURM write and read. Because you are using RF 2 per DC I assume you are not using LOCAL_QUOURM
because that is 2 and you would not have any redundancy in the DC.
> CL.ONE : this is primarily for performance reasons but also because there are only three
local nodes as you suggest and we need at least some resiliency.  
> In the context of this issue, I considered increasing this to CL.LOCAL_QUORUM but the
behaviour suggests than none of the 3 local nodes have the data (say I make 100 requests :
all 100 initially fail and subsequently all 100 succeed), so not sure it'll help?
>  
>  
> > Dropped mutations in a multi DC setup may be a sign of network congestion or overloaded
nodes.
> This DC is remote in terms of network topology - it's in Asia (Hong Kong) while the rest
of the cluster is in Europe/North America, so network latency rather than congestion could
be a cause?  However I see some pretty aggressive data transfer speeds during the initial
repairs & the data footprint approximately matches the nodes elsewhere in the ring, so
something doesn't add up?
>  
> Here are the tpstats for one of these nodes :
> Pool Name                    Active   Pending      Completed   Blocked  All time blocked
> ReadStage                         0         0       4919185         0               
 0
> RequestResponseStage              0         0      16869994         0               
 0
> MutationStage                     0         0      16764910         0               
 0
> ReadRepairStage                   0         0           3703         0              
  0
> ReplicateOnWriteStage             0         0              0         0              
  0
> GossipStage                       0         0         845225         0              
  0
> AntiEntropyStage                  0         0          52441         0              
  0
> MigrationStage                    0         0           4362         0              
  0
> MemtablePostFlusher               0         0            952         0              
  0
> StreamStage                       0         0             24         0              
  0
> FlushWriter                       0         0            960         0              
  5
> MiscStage                         0         0           3592         0              
  0
> AntiEntropySessions               4         4            121         0              
  0
> InternalResponseStage             0         0              0         0              
  0
> HintedHandoff                     1         2             55         0              
  0
>  
> Message type           Dropped
> RANGE_SLICE                  0
> READ_REPAIR             150597
> BINARY                       0
> READ                    781490
> MUTATION                853846
> REQUEST_RESPONSE             0
> The numbers of READ_REPAIR, READ & MUTATION operations & are non-negligable.
 The nodes in Europe/North America have effectively zero dropped messages.  This suggests
network latency is probably a significant factor? 
> [the network ping from Europe to a HK node is ~250ms, so I wouldn’t have expected it
to be such a problem?]
>  
> 
> > It would, but the INFO logging for the AES is pretty good. I would hold off for
now.
> Ok.
>  
> > [AES session logging]
> Yes, I see the expected start/end logs, so that's another thing off the list.
>  
>  
> 
> 
> On 10 Feb 2013, at 20:12, aaron morton <aaron@thelastpickle.com> wrote:
> 
>>> I’d request data, nothing would be returned, I would then re-request the data
and it would correctly be returned:
>>> 
>> What CL are you using for reads and writes?
>> 
>>> I see a number of dropped ‘MUTATION’ operations : just under 5% of the total
‘MutationStage’ count.
>>> 
>> Dropped mutations in a multi DC setup may be a sign of network congestion or overloaded
nodes. 
>> 
>> 
>>> -          Could anybody suggest anything specific to look at to see why the
repair operations aren’t having the desired effect? 
>>> 
>> I would first build a test case to ensure correct operation when using strong consistency.
i.e. QUOURM write and read. Because you are using RF 2 per DC I assume you are not using LOCAL_QUOURM
because that is 2 and you would not have any redundancy in the DC. 
>> 
>> 
>>> 
>>> -          Would increasing logging level to ‘DEBUG’ show read-repair activity
(to confirm that this is happening, when & for what proportion of total requests)?
>> It would, but the INFO logging for the AES is pretty good. I would hold off for now.

>> 
>>> 
>>> -          Is there something obvious that I could be missing here?
>> When a new AES session starts it logs this
>> 
>>            logger.info(String.format("[repair #%s] new session: will sync %s on range
%s for %s.%s", getName(), repairedNodes(), range, tablename, Arrays.toString(cfnames)));
>> 
>> When it completes it logs this
>> 
>> logger.info(String.format("[repair #%s] session completed successfully", getName()));
>> 
>> Or this on failure 
>> 
>> logger.error(String.format("[repair #%s] session completed with the following error",
getName()), exception);
>> 
>> 
>> Cheers
>> 
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 10/02/2013, at 9:56 PM, Brian Fleming <bigbrianfleming@gmail.com> wrote:
>> 
>>> 
>>> 
>>> 
>>> Hi,
>>> 
>>> 
>>> 
>>> I have a 20 node cluster running v1.0.7 split between 5 data centres, each with
an RF of 2, containing a ~1TB unique dataset/~10TB of total data. 
>>> 
>>> 
>>> 
>>> I’ve had some intermittent issues with a new data centre (3 nodes, RF=2) I
brought online late last year with data consistency & availability: I’d request data,
nothing would be returned, I would then re-request the data and it would correctly be returned:
i.e. read-repair appeared to be occurring.  However running repairs on the nodes didn’t
resolve this (I tried general ‘repair’ commands as well as targeted keyspace commands)
– this didn’t alter the behaviour.
>>> 
>>> 
>>> 
>>> After a lot of fruitless investigation, I decided to wipe & re-install/re-populate
the nodes.  The re-install & repair operations are now complete: I see the expected amount
of data on the nodes, however I am still seeing the same behaviour, i.e. I only get data after
one failed attempt.
>>> 
>>> 
>>> 
>>> When I run repair commands, I don’t see any errors in the logs. 
>>> 
>>> I see the expected ‘AntiEntropySessions’ count in ‘nodetool tpstats’
during repair sessions.
>>> 
>>> I see a number of dropped ‘MUTATION’ operations : just under 5% of the total
‘MutationStage’ count.
>>> 
>>> 
>>> 
>>> Questions :
>>> 
>>> -          Could anybody suggest anything specific to look at to see why the
repair operations aren’t having the desired effect? 
>>> 
>>> -          Would increasing logging level to ‘DEBUG’ show read-repair activity
(to confirm that this is happening, when & for what proportion of total requests)?
>>> 
>>> -          Is there something obvious that I could be missing here?
>>> 
>>> 
>>> 
>>> Many thanks,
>>> 
>>> Brian
>>> 
>>> 
>>> 
>> 


Mime
View raw message