cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor <i...@4friends.od.ua>
Subject Re: repair strange behavior
Date Mon, 23 Apr 2012 09:51:33 GMT
Hi, Aaron

Just sum of total volume for all streams between nodes.

But seems I understand what happened: after repair my column family pass 
over several minor compactions, and during these compactions it create 
new tombstones (my CF contain data with TTL, so it can discover and mark 
new data each time it make minor compaction). As these tombstones 
arranged and created differently on each node (sstables have different 
sizes and so on, so size-tiered compaction works slightly different) - 
each subsequent repair discover new ranges to sync.

When I try to run *major* compaction, and then run repair it vent in 
minutes (against hours) as far as I understand - because after major 
compaction tombstones on all nodes are almost the same.

Does it sounds reasonable?

I'll try to find best strategy to minimize repair streams as I'm afraid 
of major compactions for other, possible large, CFs.

On 04/23/2012 12:34 PM, aaron morton wrote:
>> What is strange - when streams for the second repair starts they have 
>> the same or even bigger total volume,
> What measure are you using ?
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 22/04/2012, at 10:16 PM, Igor wrote:
>
>> but after repair all nodes should be in sync regardless of whether 
>> new files were compacted or not.
>> Do you suggest major compaction after repair? I'd like to avoid it.
>>
>> On 04/22/2012 11:52 AM, Philippe wrote:
>>>
>>> Repairs generate new files that then need to be compacted.
>>> Maybe that's where the temporary extra volume comes from?
>>>
>>> Le 21 avr. 2012 20:43, "Igor" <igor@4friends.od.ua 
>>> <mailto:igor@4friends.od.ua>> a écrit :
>>>
>>>     Hi
>>>
>>>     I can't understand the repair behavior in my case. I have 12
>>>     nodes ring (all 1.0.7):
>>>
>>>     10.254.237.2    LA          ADS-LA-1    Up     Normal  50.92 GB
>>>            0.00%   0
>>>     10.254.238.2    TX          TX-24-RACK  Up     Normal  33.29 GB
>>>            0.00%   1
>>>     10.254.236.2    VA          ADS-VA-1    Up     Normal  50.07 GB
>>>            0.00%   2
>>>     10.254.93.2     IL          R1          Up     Normal  49.29 GB
>>>            0.00%   3
>>>     10.253.4.2      AZ          R1          Up     Normal  37.83 GB
>>>            0.00%   5
>>>     10.254.180.2    GB          GB-1        Up     Normal  42.86 GB
>>>            50.00%  85070591730234615865843651857942052863
>>>     10.254.191.2    LA          ADS-LA-1    Up     Normal  47.64 GB
>>>            0.00%   85070591730234615865843651857942052864
>>>     10.254.221.2    TX          TX-24-RACK  Up     Normal  43.42 GB
>>>            0.00%   85070591730234615865843651857942052865
>>>     10.254.217.2    VA          ADS-VA-1    Up     Normal  38.44 GB
>>>            0.00%   85070591730234615865843651857942052866
>>>     10.254.94.2     IL          R1          Up     Normal  49.31 GB
>>>            0.00%   85070591730234615865843651857942052867
>>>     10.253.5.2      AZ          R1          Up     Normal  49.01 GB
>>>            0.00%   85070591730234615865843651857942052869
>>>     10.254.179.2    GB          GB-1        Up     Normal  27.08 GB
>>>            50.00%  170141183460469231731687303715884105727
>>>
>>>     I have single keyspace 'meter' and two column families (one
>>>     'ids' is small, and second is bigger). The strange thing
>>>     happened today when I try to run
>>>     "nodetool -h 10.254.180.2 -pr meter ids"
>>>     two times one after another. First repair finished successfully
>>>
>>>      INFO 16:33:02,492 [repair
>>>     #db582370-8bba-11e1-0000-5b777f708bff] ids is fully synced
>>>      INFO 16:33:02,526 [repair
>>>     #db582370-8bba-11e1-0000-5b777f708bff] session completed
>>>     successfully
>>>
>>>     after moving near 50G of data, and I started second session one
>>>     hour later:
>>>
>>>     INFO 17:44:37,842 [repair #aa415d00-8bd9-11e1-0000-5b777f708bff]
>>>     new session: will sync localhost/1
>>>     0.254.180.2, /10.254.221.2 <http://10.254.221.2/>, /10.254.191.2
>>>     <http://10.254.191.2/>, /10.254.217.2 <http://10.254.217.2/>,
>>>     /10.253.5.2 <http://10.253.5.2/>, /10.254.94.2
>>>     <http://10.254.94.2/> on range (5,8507
>>>     0591730234615865843651857942052863] for meter.[ids]
>>>
>>>     What is strange - when streams for the second repair starts they
>>>     have the same or even bigger total volume, and I expected that
>>>     second run will move less data (or even no data at all).
>>>
>>>     Is it OK? Or should I fix something?
>>>
>>>     Thanks!
>>>
>>
>


Mime
View raw message