incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Lots of Deleted Rows Came back after upgrade 1.1.6 to 1.1.10
Date Sun, 24 Mar 2013 17:03:53 GMT
> I could imagine a  scenario where a hint was replayed to a replica after all replicas
had purged their tombstones
Scratch that, the hints are TTL'd with the lowest gc_grace. 
Ticket closed https://issues.apache.org/jira/browse/CASSANDRA-5379

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 24/03/2013, at 6:24 AM, aaron morton <aaron@thelastpickle.com> wrote:

>> Beside the joke, would hinted handoff really have any role in this issue?
> I could imagine a  scenario where a hint was replayed to a replica after all replicas
had purged their tombstones. That seems like a long shot, it would need one node to be down
for the write and all up for the delete and for all of them to have purged the tombstone.
But maybe we should have a max age on hints so it cannot happen. 
> 
> Created https://issues.apache.org/jira/browse/CASSANDRA-5379
> 
> Ensuring no hints are in place during an upgrade would work around. I tend to make sure
hints and commit log are clear during an upgrade. 
> 
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 22/03/2013, at 7:54 AM, Arya Goudarzi <goudarzi@gmail.com> wrote:
> 
>> Beside the joke, would hinted handoff really have any role in this issue? I have
been struggling to reproduce this issue using the snapshot data taken from our cluster and
following the same upgrade process from 1.1.6 to 1.1.10. I know snapshots only link to active
SSTables. What if these returned rows belong to some inactive SSTables and some bug exposed
itself and marked them as active? What are the possibilities that could lead to this? I am
eager to find our as this is blocking our upgrade.
>> 
>> On Tue, Mar 19, 2013 at 2:11 AM, <moshe.kranc@barclays.com> wrote:
>> This obscure feature of Cassandra is called “haunted handoff”.
>> 
>>  
>> 
>> Happy (early) April Fools J
>> 
>>  
>> 
>> From: aaron morton [mailto:aaron@thelastpickle.com] 
>> Sent: Monday, March 18, 2013 7:45 PM
>> To: user@cassandra.apache.org
>> Subject: Re: Lots of Deleted Rows Came back after upgrade 1.1.6 to 1.1.10
>> 
>>  
>> 
>> As you see, this node thinks lots of ranges are out of sync which shouldn't be the
case as successful repairs where done every night prior to the upgrade. 
>> 
>> Could this be explained by writes occurring during the upgrade process ? 
>> 
>>  
>> 
>> I found this bug which touches timestamp and tomstones which was fixed in 1.1.10
but am not 100% sure if it could be related to this issue: https://issues.apache.org/jira/browse/CASSANDRA-5153
>> 
>> Me neither, but the issue was fixed in 1.1.0
>> 
>>  
>> 
>>  It appears that the repair task that I executed after upgrade, brought back lots
of deleted rows into life.
>> 
>> Was it entire rows or columns in a row?
>> 
>> Do you know if row level or column level deletes were used ? 
>> 
>>  
>> 
>> Can you look at the data in cassanca-cli and confirm the timestamps on the columns
make sense ?  
>> 
>>  
>> 
>> Cheers
>> 
>>  
>> 
>> -----------------
>> 
>> Aaron Morton
>> 
>> Freelance Cassandra Consultant
>> 
>> New Zealand
>> 
>>  
>> 
>> @aaronmorton
>> 
>> http://www.thelastpickle.com
>> 
>>  
>> 
>> On 16/03/2013, at 2:31 PM, Arya Goudarzi <goudarzi@gmail.com> wrote:
>> 
>> 
>> 
>> 
>> Hi,
>> 
>>  
>> 
>> I have upgraded our test cluster from 1.1.6 to 1.1.10. Followed by running repairs.
It appears that the repair task that I executed after upgrade, brought back lots of deleted
rows into life. Here are some logistics:
>> 
>>  
>> 
>> - The upgraded cluster started from 1.1.1 -> 1.1.2 -> 1.1.5 -> 1.1.6 
>> 
>> - Old cluster: 4 node, C* 1.1.6 with RF3 using NetworkTopology;
>> 
>> - Upgrade to : 1.1.10 with all other settings the same;
>> 
>> - Successful repairs were being done on this cluster every night;
>> 
>> - Our clients use nanosecond precision timestamp for cassandra calls;
>> 
>> - After upgrade, while running repair I say some log messages like this in one node:
>> 
>>  
>> 
>> system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:54,847 AntiEntropyService.java
(line 1022) [repair #0990f320-8da9-11e2-0000-e9b2bd8ea1bd] Endpoints /XX.194.60 and /23.20.207.56
have 2223 range(s) out of sync for App
>> 
>> system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:54,877 AntiEntropyService.java
(line 1022) [repair #0990f320-8da9-11e2-0000-e9b2bd8ea1bd] Endpoints /XX.250.43 and /23.20.207.56
have 161 range(s) out of sync for App
>> 
>> system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:55,097 AntiEntropyService.java
(line 1022) [repair #0990f320-8da9-11e2-0000-e9b2bd8ea1bd] Endpoints /XX.194.60 and /23.20.250.43
have 2294 range(s) out of sync for App
>> 
>> system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:59,190 AntiEntropyService.java
(line 789) [repair #0990f320-8da9-11e2-0000-e9b2bd8ea1bd] App is fully synced (13 remaining
column family to sync for this session)
>> 
>>  
>> 
>> As you see, this node thinks lots of ranges are out of sync which shouldn't be the
case as successful repairs where done every night prior to the upgrade. 
>> 
>>  
>> 
>> The App CF uses SizeTiered with gc_grace of 10 days. It has caching = 'ALL', and
it is fairly small (11Mb on each node).
>> 
>>  
>> 
>> I found this bug which touches timestamp and tomstones which was fixed in 1.1.10
but am not 100% sure if it could be related to this issue: https://issues.apache.org/jira/browse/CASSANDRA-5153
>> 
>>  
>> 
>> Any advice on how to dig deeper into this would be appreciated.
>> 
>>  
>> 
>> Thanks,
>> 
>> -Arya
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>> _______________________________________________
>> 
>> This message may contain information that is confidential or privileged. If you are
not an intended recipient of this message, please delete it and any attachments, and notify
the sender that you have received it in error. Unless specifically stated in the message or
otherwise indicated, you may not duplicate, redistribute or forward this message or any portion
thereof, including any attachments, by any means to any other person, including any retail
investor or customer. This message is not a recommendation, advice, offer or solicitation,
to buy/sell any product or service, and is not an official confirmation of any transaction.
Any opinions presented are solely those of the author and do not necessarily represent those
of Barclays. This message is subject to terms available at: www.barclays.com/emaildisclaimer
and, if received from Barclays' Sales or Trading desk, the terms available at: www.barclays.com/salesandtradingdisclaimer/.
By messaging with Barclays you consent to the foregoing. Barclays Bank PLC is a company registered
in England (number 1026167) with its registered office at 1 Churchill Place, London, E14 5HP.
This email may relate to or be sent from other members of the Barclays group.
>> 
>> _______________________________________________
>> 
>> 
> 


Mime
View raw message