From user-return-30484-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Thu Dec 6 03:06:26 2012 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 89208D6B1 for ; Thu, 6 Dec 2012 03:06:26 +0000 (UTC) Received: (qmail 34318 invoked by uid 500); 6 Dec 2012 03:06:23 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 34163 invoked by uid 500); 6 Dec 2012 03:06:22 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 34132 invoked by uid 99); 6 Dec 2012 03:06:21 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Dec 2012 03:06:21 +0000 X-ASF-Spam-Status: No, hits=3.3 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,TRACKER_ID X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a43.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Dec 2012 03:06:15 +0000 Received: from homiemail-a43.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a43.g.dreamhost.com (Postfix) with ESMTP id 64CA98C058 for ; Wed, 5 Dec 2012 19:06:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :content-type:message-id:mime-version:subject:date:references:to :in-reply-to; s=thelastpickle.com; bh=GxXl+81h7vQs/rCZvhUkrxF14/ E=; b=J3peCeBw6RuD8PH2npDG8nTpH5LaVgqyvlodsB7Au2/+FjRvXnIdSA+Nbz zGpWhgw4MkJXZ6kV09uRhWRCdL/d2CE+1Wn6qmAdO1fmwPVqYOc3rUBOQNw5UdnI Z20pLgYMF/YP9dRgVCVZAd0bczrwT24pbO1g49KwLi3hzGJr8= Received: from [192.168.2.13] (unknown [116.90.132.105]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a43.g.dreamhost.com (Postfix) with ESMTPSA id BB8F58C057 for ; Wed, 5 Dec 2012 19:06:12 -0800 (PST) From: aaron morton Content-Type: multipart/alternative; boundary="Apple-Mail=_A3595DE2-D0B0-4FAB-BDE1-093681619039" Message-Id: Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: entire range of node out of sync -- out of the blue Date: Thu, 6 Dec 2012 16:05:52 +1300 References: To: user@cassandra.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1499) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_A3595DE2-D0B0-4FAB-BDE1-093681619039 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 > - how do i stop repair before i run out of storage? ( can't let this = finish ) To stop the validation part of the repair=85 nodetool -h localhost stop VALIDATION=20 The only way I know to stop streaming is restart the node, their may be = a better way though.=20 > INFO [AntiEntropySessions:3] 2012-12-05 02:15:02,301 = AntiEntropyService.java (line 666) [repair = #7c7665c0-3eab-11e2-0000-dae6667065ff] new session: will sync = /X.X.1.113, /X.X.0.71 on range = (85070591730234615865843651857942052964,0] for ( .. ) Am assuming this was ran on the first node in DC west with -pr as you = said. The log message is saying this is going to repair the primary range for = the node for the node. The repair is then actually performed one CF at a = time.=20 You should also see log messages ending with "range(s) out of sync" = which will say how out of sync the data is.=20 =20 > - how do i clean up my stables ( grew from 6k to 20k since this = started, while i shut writes off completely ) Sounds like repair is streaming a lot of differences.=20 If you have the space I would give Levelled compaction time to take = care of it.=20 Hope that helps. ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 6/12/2012, at 1:32 AM, Andras Szerdahelyi = wrote: > hi list, >=20 > AntiEntropyService started syncing ranges of entire nodes ( ?! ) = across my data centers and i'd like to understand why.=20 >=20 > I see log lines like this on all my nodes in my two ( east/west ) data = centres... >=20 > INFO [AntiEntropySessions:3] 2012-12-05 02:15:02,301 = AntiEntropyService.java (line 666) [repair = #7c7665c0-3eab-11e2-0000-dae6667065ff] new session: will sync = /X.X.1.113, /X.X.0.71 on range = (85070591730234615865843651857942052964,0] for ( .. ) >=20 > ( this is around 80-100 GB of data for a single node. ) >=20 > - i did not observe any network failures or nodes falling off the ring > - good distribution of data ( load is equal on all nodes ) > - hinted handoff is on > - read repair chance is 0.1 on the CF > - 2 replicas in each data centre ( which is also the number of nodes = in each ) with NetworkTopologyStrategy > - repair -pr is scheduled to run off-peak hours, daily > - leveled compaction with stable max size 256mb ( i have found this to = trigger compaction in acceptable intervals while still keeping the = stable count down ) > - i am on 1.1.6 > - java heap 10G > - max memtables 2G > - 1G row cache > - 256M key cache >=20 > my nodes' ranges are: >=20 > DC west > 0 > 85070591730234615865843651857942052864 >=20 > DC east > 100 > 85070591730234615865843651857942052964 >=20 > symptoms are: > - logs show sstables being streamed over to other nodes > - 140k files in data dir of CF on all nodes > - cfstats reports 20k sstables, up from 6 on all nodes > - compaction continuously running with no results whatsoever ( number = of stables growing ) >=20 > i tried the following: > - offline scrub ( has gone OOM, i noticed the script in the debian = package specifies 256MB heap? ) > - online scrub ( no effect ) > - repair ( no effect ) > - cleanup ( no effect ) >=20 > my questions are: > - how do i stop repair before i run out of storage? ( can't let this = finish ) > - how do i clean up my stables ( grew from 6k to 20k since this = started, while i shut writes off completely ) >=20 > thanks, > Andras >=20 > Andras Szerdahelyi > Solutions Architect, IgnitionOne | 1831 Diegem E.Mommaertslaan 20A > M: +32 493 05 50 88 | Skype: sandrew84 >=20 >=20 > >=20 >=20 --Apple-Mail=_A3595DE2-D0B0-4FAB-BDE1-093681619039 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=windows-1252
- how = do i stop repair before i run out of storage? ( can't let this finish = )
To = stop the validation part of the repair=85



The only way I know to stop = streaming is restart the node, their may be a better way = though. 

INFO [AntiEntropySessions:3] = 2012-12-05 02:15:02,301 AntiEntropyService.java (line 666) [repair = #7c7665c0-3eab-11e2-0000-dae6667065ff] new session: will sync = /X.X.1.113, /X.X.0.71 on range = (85070591730234615865843651857942052964,0] for ( .. = )
Am assuming this was ran on the first node in DC = west with -pr as you said.
The = log message is saying this is going to repair the primary range for the = node for the node. The repair is then actually performed one CF at a = time. 

You should also see log = messages ending with "range(s) out of sync" which will say how out of = sync the data is. 
- how do i clean up my = stables ( grew from 6k to 20k since this started, while i shut writes = off completely )
Sounds like repair is streaming a lot of = differences. 
If = you have the space I would give  Levelled compaction time to take = care of it. 
Hope that = helps.

http://www.thelastpickle.com

On 6/12/2012, at 1:32 AM, Andras Szerdahelyi <andras.szerdahelyi@igni= tionone.com> wrote:

hi list,

AntiEntropyService started syncing ranges of entire nodes ( = ?! ) across my data centers and i'd like to understand = why. 

I see log lines like this on all my nodes in my two ( east/west ) = data centres...

INFO [AntiEntropySessions:3] 2012-12-05 02:15:02,301 = AntiEntropyService.java (line 666) [repair = #7c7665c0-3eab-11e2-0000-dae6667065ff] new session: will sync = /X.X.1.113, /X.X.0.71 on range = (85070591730234615865843651857942052964,0] for ( .. )

( this is around 80-100 GB of data for a single node. )

- i did not observe any network failures or nodes falling off the = ring
- good distribution of data ( load is equal on all nodes )
- hinted handoff is on
- read repair chance is 0.1 on the CF
- 2 replicas in each data centre ( which is also the number of = nodes in each ) with NetworkTopologyStrategy
- repair -pr is scheduled to run off-peak hours, daily
- leveled compaction with stable max size 256mb ( i have found this = to trigger compaction in acceptable intervals while still keeping the = stable count down )
- i am on 1.1.6
- java heap 10G
- max memtables 2G
- 1G row cache
- 256M key cache

my nodes'  ranges are:

DC west
0
85070591730234615865843651857942052864

DC east
100
85070591730234615865843651857942052964

symptoms are:
- logs show sstables being streamed over to other nodes
- 140k files in data dir of CF on all nodes
- cfstats reports 20k sstables, up from 6 on all nodes
- compaction continuously running with no results whatsoever ( = number of stables growing )

i tried the following:
- offline scrub ( has gone OOM, i noticed the script in the debian = package specifies 256MB heap? )
- online scrub ( no effect )
- repair ( no effect )
- cleanup ( no effect )

my questions are:
- how do i stop repair before i run out of storage? ( can't let = this finish )
- how do i clean up my stables ( grew from 6k to 20k since this = started, while i shut writes off completely )

thanks,
Andras

Andras Szerdahelyi
Solutions = Architect, IgnitionOne | = 1831 Diegem E.Mommaertslaan 20A
M: +32 493 05 50 88 | Skype: = sandrew84





= --Apple-Mail=_A3595DE2-D0B0-4FAB-BDE1-093681619039--