incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "B. Todd Burruss" <bto...@gmail.com>
Subject Re: entire range of node out of sync -- out of the blue
Date Tue, 18 Dec 2012 21:09:01 GMT
in your data directory, for each keyspace there is a solr.json.  cassandra
stores the SSTABLEs it knows about when using leveled compaction.  take a
look at that file and see if it looks accurate.  if not, this is a bug with
cassandra that we are checking into as well


On Thu, Dec 6, 2012 at 7:38 PM, aaron morton <aaron@thelastpickle.com>wrote:

> The log message matches what I would expect to see for nodetool -pr
>
> Not using pr means repair all the ranges the node is a replica for. If you
> have RF == number of nodes, then it will repair all the data.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 6/12/2012, at 9:42 PM, Andras Szerdahelyi <
> andras.szerdahelyi@ignitionone.com> wrote:
>
>  Thanks!
>
>  i'm also thinking a repair  run without -pr could have caused this maybe
> ?
>
>
> Andras Szerdahelyi*
> *Solutions Architect, IgnitionOne | 1831 Diegem E.Mommaertslaan 20A
> M: +32 493 05 50 88 | Skype: sandrew84
>
>
> <C4798BB9-9092-4145-880B-A72C6B7AF9A4[41].png>
>
>
>  On 06 Dec 2012, at 04:05, aaron morton <aaron@thelastpickle.com> wrote:
>
>   - how do i stop repair before i run out of storage? ( can't let this
> finish )
>
>
>   To stop the validation part of the repair…
>
>  nodetool -h localhost stop VALIDATION
>
>
>  The only way I know to stop streaming is restart the node, their may be
> a better way though.
>
>
>   INFO [AntiEntropySessions:3] 2012-12-05 02:15:02,301
> AntiEntropyService.java (line 666) [repair
> #7c7665c0-3eab-11e2-0000-dae6667065ff] new session: will sync /X.X.1.113,
> /X.X.0.71 on range (*85070591730234615865843651857942052964,0*] for ( .. )
>
> Am assuming this was ran on the first node in DC west with -pr as you said.
>  The log message is saying this is going to repair the primary range for
> the node for the node. The repair is then actually performed one CF at a
> time.
>
>  You should also see log messages ending with "range(s) out of sync"
> which will say how out of sync the data is.
>
>
>  - how do i clean up my stables ( grew from 6k to 20k since this started,
> while i shut writes off completely )
>
>  Sounds like repair is streaming a lot of differences.
>  If you have the space I would give  Levelled compaction time to take
> care of it.
>
>  Hope that helps.
>
>       -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
>  @aaronmorton
> http://www.thelastpickle.com
>
>  On 6/12/2012, at 1:32 AM, Andras Szerdahelyi <
> andras.szerdahelyi@ignitionone.com> wrote:
>
>  hi list,
>
>  AntiEntropyService started syncing ranges of entire nodes ( ?! ) across
> my data centers and i'd like to understand why.
>
>  I see log lines like this on all my nodes in my two ( east/west ) data
> centres...
>
>  INFO [AntiEntropySessions:3] 2012-12-05 02:15:02,301
> AntiEntropyService.java (line 666) [repair
> #7c7665c0-3eab-11e2-0000-dae6667065ff] new session: will sync /X.X.1.113,
> /X.X.0.71 on range (*85070591730234615865843651857942052964,0*] for ( .. )
>
>  ( this is around 80-100 GB of data for a single node. )
>
>  - i did not observe any network failures or nodes falling off the ring
> - good distribution of data ( load is equal on all nodes )
> - hinted handoff is on
> - read repair chance is 0.1 on the CF
> - 2 replicas in each data centre ( which is also the number of nodes in
> each ) with NetworkTopologyStrategy
> - repair -pr is scheduled to run off-peak hours, daily
> - leveled compaction with stable max size 256mb ( i have found this to
> trigger compaction in acceptable intervals while still keeping the stable
> count down )
> - i am on 1.1.6
> - java heap 10G
> - max memtables 2G
> - 1G row cache
> - 256M key cache
>
>  my nodes'  ranges are:
>
>  DC west
> 0
> 85070591730234615865843651857942052864
>
>  DC east
> 100
> 85070591730234615865843651857942052964
>
>  symptoms are:
> - logs show sstables being streamed over to other nodes
> - 140k files in data dir of CF on all nodes
> - cfstats reports 20k sstables, up from 6 on all nodes
> - compaction continuously running with no results whatsoever ( number of
> stables growing )
>
>  i tried the following:
> - offline scrub ( has gone OOM, i noticed the script in the debian package
> specifies 256MB heap? )
> - online scrub ( no effect )
> - repair ( no effect )
> - cleanup ( no effect )
>
>  my questions are:
> - how do i stop repair before i run out of storage? ( can't let this
> finish )
> - how do i clean up my stables ( grew from 6k to 20k since this started,
> while i shut writes off completely )
>
>  thanks,
> Andras
>
> Andras Szerdahelyi*
> *Solutions Architect, IgnitionOne | 1831 Diegem E.Mommaertslaan 20A
> M: +32 493 05 50 88 | Skype: sandrew84
>
>
> <C4798BB9-9092-4145-880B-A72C6B7AF9A4[41].png>
>
>
>
>
>
>

Mime
View raw message