incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rubbish me <rubbish...@googlemail.com>
Subject Re: Commit log periodic sync?
Date Fri, 24 Aug 2012 23:00:10 GMT
Thanks, Aaron, for your reply - please see the inline.


On 24 Aug 2012, at 11:04, aaron morton wrote:

>> - we are running on production linux VMs (not ideal but this is out of our hands)
> Is the VM doing anything wacky with the IO ?

Could be.  But I thought we would ask here first.  This is a bit difficult to prove cos we
dont have the control over these VMs.

>  
> 
>> As part of a DR exercise, we killed all 6 nodes in DC1,
> Nice disaster. Out of interest, what was the shutdown process ?

Brutally. kill -9.


> 
>> We noticed that data that was written an hour before the exercise, around the last
memtables being flushed,was not found in DC1. 
> To confirm, data was written to DC 1 at CL LOCAL_QUORUM before the DR exercise. 
> 
> Was the missing data written before or after the memtable flush ? I'm trying to understand
if the data should have been in the commit log or the memtables. 

Missing data was those written after the last flush.  These data was retrievable before the
DR exercise.

> 
> Can you provide some more info on how you are detecting it is not found in DC 1?
> 

We tried hector, consistencylevel=local quorum.  We had missing column or the whole row. 


We tried cassandra-cli on DC1 nodes, same.

However once we run the same query on DC2, C* must have then done a read-repair. That particular
piece of result data would appear in DC1 again.


>> If we understand correctly, commit logs are being written first and then to disk
every 10s. 
> Writes are put into a bounded queue and processed as fast as the IO can keep up. Every
10s a sync messages is added to the queue. Not that the commit log segment may rotate at any
time which requires a sync. 
> 
> A loss of data across all nodes in a DC seems odd. If you can provide some more information
we may be able to help. 


We are wondering if the fsync of the commit log was working.  But we saw no errors / warning
in logs.  Wondering if there is way to verify....


> 
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 24/08/2012, at 6:01 AM, rubbish me <rubbish.me@googlemail.com> wrote:
> 
>> Hi all
>> 
>> First off, let's introduce the setup. 
>> 
>> - 6 x C* 1.1.2 in active DC (DC1), another 6 in another (DC2)
>> - keyspace's RF=3 in each DC
>> - Hector as client.
>> - client talks only to DC1 unless DC1 can't serve the request. In which case talks
only to DC2
>> - commit log was periodically sync with the default setting of 10s. 
>> - consistency policy = LOCAL QUORUM for both read and write. 
>> - we are running on production linux VMs (not ideal but this is out of our hands)
>> -----
>> As part of a DR exercise, we killed all 6 nodes in DC1, hector starts talking to
DC2, all the data was still there, everything continued to work perfectly. 
>> 
>> Then we brought all nodes, one by one, in DC1 up. We saw a message saying all the
commit logs were replayed. No errors reported.  We didn't run repair at this time. 
>> 
>> We noticed that data that was written an hour before the exercise, around the last
memtables being flushed,was not found in DC1. 
>> 
>> If we understand correctly, commit logs are being written first and then to disk
every 10s. At worst we lost the last 10s of data. What could be the cause of this behaviour?

>> 
>> With the blessing of C* we could recovered all these data from DC2. But we would
like to understand why. 
>> 
>> Many thanks in advanced. 
>> 
>> Amy
>> 
>> 
> 


Mime
View raw message