Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D2E9996ED for ; Fri, 3 Feb 2012 07:40:09 +0000 (UTC) Received: (qmail 28870 invoked by uid 500); 3 Feb 2012 07:40:06 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 28579 invoked by uid 500); 3 Feb 2012 07:39:56 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 28566 invoked by uid 99); 3 Feb 2012 07:39:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Feb 2012 07:39:51 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL,TO_NO_BRKTS_PCNT X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.220.172] (HELO mail-vx0-f172.google.com) (209.85.220.172) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Feb 2012 07:39:44 +0000 Received: by vcbfk14 with SMTP id fk14so2556926vcb.31 for ; Thu, 02 Feb 2012 23:39:23 -0800 (PST) MIME-Version: 1.0 Received: by 10.220.116.70 with SMTP id l6mr2958582vcq.55.1328254762753; Thu, 02 Feb 2012 23:39:22 -0800 (PST) Received: by 10.52.156.132 with HTTP; Thu, 2 Feb 2012 23:39:22 -0800 (PST) In-Reply-To: References: <4F20464B.9060308@hiramoto.org> <4F20491F.3070304@hiramoto.org> <6F1D54DD1FE19B408496215469822A8701F060B376D3@OCDP-ERFMMBX03.ERF.THOMSON.COM> <30C7D8E3-3C2E-40C5-8629-302DEBDC480A@thelastpickle.com> <0A794C24-16F6-41AE-9ADA-F9FB0BA5EFB4@thelastpickle.com> <4F2410BF.2050803@bnl.gov> <4D115C38-6CA7-4508-AEBB-42C11F715BC2@thelastpickle.com> <947E79FB-ACC9-4384-AD7D-2AFBCB66E5A9@thelastpickle.com> Date: Fri, 3 Feb 2012 08:39:22 +0100 Message-ID: Subject: Re: Restart cassandra every X days? From: "R. Verlangen" To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=f46d042f911a4e202a04b80a6962 X-Virus-Checked: Checked by ClamAV on apache.org --f46d042f911a4e202a04b80a6962 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Well, it seems it's balancing itself, 24 hours later the ring looks like this: ***.89 datacenter1 rack1 Up Normal 7.36 GB 50.00% 0 ***.135 datacenter1 rack1 Up Normal 8.84 GB 50.00% 85070591730234615865843651857942052864 Looks pretty normal, right? 2012/2/2 aaron morton > Speaking technically, that ain't right. > > I would: > * Check if node .135 is holding a lot of hints. > * Take a look on disk and see what is there. > * Go through a repair and compact on each node. > > > Cheers > > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 2/02/2012, at 9:55 PM, R. Verlangen wrote: > > Yes, I already did a repair and cleanup. Currently my ring looks like thi= s: > > Address DC Rack Status State Load > Owns Token > ***.89 datacenter1 rack1 Up Normal 2.44 GB 50.00% = 0 > ***.135 datacenter1 rack1 Up Normal 6.99 GB 50.00% > 85070591730234615865843651857942052864 > > It's not really a problem, but I'm still wondering why this happens. > > 2012/2/1 aaron morton > >> Do you mean the load in nodetool ring is not even, despite the tokens >> been evenly distributed ? >> >> I would assume this is not the case given the difference, but it may be >> hints given you have just done an upgrade. Check the system using nodeto= ol >> cfstats to see. They will eventually be delivered and deleted. >> >> More likely you will want to: >> 1) nodetool repair to make sure all data is distributed then >> 2) nodetool cleanup if you have changed the tokens at any point finally >> >> Cheers >> >> ----------------- >> Aaron Morton >> Freelance Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 31/01/2012, at 11:56 PM, R. Verlangen wrote: >> >> After running 3 days on Cassandra 1.0.7 it seems the problem has been >> solved. One weird thing remains, on our 2 nodes (both 50% of the ring), = the >> first's usage is just over 25% of the second. >> >> Anyone got an explanation for that? >> >> 2012/1/29 aaron morton >> >>> Yes but=85 >>> >>> For every upgrade read the NEWS.TXT it will go through the upgrade >>> procedure in detail. If you want to feel extra smart scan through the >>> CHANGES.txt to get an idea of whats going on. >>> >>> Cheers >>> >>> ----------------- >>> Aaron Morton >>> Freelance Developer >>> @aaronmorton >>> http://www.thelastpickle.com >>> >>> On 29/01/2012, at 4:14 AM, Maxim Potekhin wrote: >>> >>> Sorry if this has been covered, I was concentrating solely on 0.8x -- >>> can I just d/l 1.0.x and continue using same data on same cluster? >>> >>> Maxim >>> >>> >>> On 1/28/2012 7:53 AM, R. Verlangen wrote: >>> >>> Ok, seems that it's clear what I should do next ;-) >>> >>> 2012/1/28 aaron morton >>> >>>> There are no blockers to upgrading to 1.0.X. >>>> >>>> A >>>> ----------------- >>>> Aaron Morton >>>> Freelance Developer >>>> @aaronmorton >>>> http://www.thelastpickle.com >>>> >>>> On 28/01/2012, at 7:48 AM, R. Verlangen wrote: >>>> >>>> Ok. Seems that an upgrade might fix these problems. Is Cassandra 1.x.x >>>> stable enough to upgrade for, or should we wait for a couple of weeks? >>>> >>>> 2012/1/27 Edward Capriolo >>>> >>>>> I would not say that issuing restart after x days is a good idea. You >>>>> are mostly developing a superstition. You should find the source of t= he >>>>> problem. It could be jmx or thrift clients not closing connections. W= e >>>>> don't restart nodes on a regiment they work fine. >>>>> >>>>> >>>>> On Thursday, January 26, 2012, Mike Panchenko wrote: >>>>> > There are two relevant bugs (that I know of), both resolved in >>>>> somewhat recent versions, which make somewhat regular restarts benefi= cial >>>>> > https://issues.apache.org/jira/browse/CASSANDRA-2868 (memory leak >>>>> in GCInspector, fixed in 0.7.9/0.8.5) >>>>> > https://issues.apache.org/jira/browse/CASSANDRA-2252 (heap >>>>> fragmentation due to the way memtables used to be allocated, refactor= ed in >>>>> 1.0.0) >>>>> > Restarting daily is probably too frequent for either one of those >>>>> problems. We usually notice degraded performance in our ancient clust= er >>>>> after ~2 weeks w/o a restart. >>>>> > As Aaron mentioned, if you have plenty of disk space, there's no >>>>> reason to worry about "cruft" sstables. The size of your active set i= s what >>>>> matters, and you can determine if that's getting too big by watching = for >>>>> iowait (due to reads from the data partition) and/or paging activity = of the >>>>> java process. When you hit that problem, the solution is to 1. try to= tune >>>>> your caches and 2. add more nodes to spread the load. I'll reiterate = - >>>>> looking at raw disk space usage should not be your guide for that. >>>>> > "Forcing" a gc generally works, but should not be relied upon (note >>>>> "suggest" in >>>>> http://docs.oracle.com/javase/6/docs/api/java/lang/System.html#gc()). >>>>> It's great news that 1.0 uses a better mechanism for releasing unused >>>>> sstables. >>>>> > nodetool compact triggers a "major" compaction and is no longer a >>>>> recommended by datastax (details here >>>>> http://www.datastax.com/docs/1.0/operations/tuning#tuning-compactionb= ottom of the page). >>>>> > Hope this helps. >>>>> > Mike. >>>>> > On Wed, Jan 25, 2012 at 5:14 PM, aaron morton < >>>>> aaron@thelastpickle.com> wrote: >>>>> > >>>>> > That disk usage pattern is to be expected in pre 1.0 versions. Disk >>>>> usage is far less interesting than disk free space, if it's using 60 = GB and >>>>> there is 200GB thats ok. If it's using 60Gb and there is 6MB free tha= ts a >>>>> problem. >>>>> > In pre 1.0 the compacted files are deleted on disk by waiting for >>>>> the JVM do decide to GC all remaining references. If there is not eno= ugh >>>>> space (to store the total size of the files it is about to write or >>>>> compact) on disk GC is forced and the files are deleted. Otherwise th= ey >>>>> will get deleted at some point in the future. >>>>> > In 1.0 files are reference counted and space is freed much sooner. >>>>> > With regard to regular maintenance, node tool cleanup remvos data >>>>> from a node that it is no longer a replica for. This is only of use w= hen >>>>> you have done a token move. >>>>> > I would not recommend a daily restart of the cassandra process. You >>>>> will lose all the run time optimizations the JVM has made (i think th= e >>>>> mapped files pages will stay resident). As well as adding additional >>>>> entropy to the system which must be repaired via HH, RR or nodetool r= epair. >>>>> > If you want to see compacted files purged faster the best approach >>>>> would be to upgrade to 1.0. >>>>> > Hope that helps. >>>>> > ----------------- >>>>> > Aaron Morton >>>>> > Freelance Developer >>>>> > @aaronmorton >>>>> > http://www.thelastpickle.com >>>>> > On 26/01/2012, at 9:51 AM, R. Verlangen wrote: >>>>> > >>>>> > In his message he explains that it's for " Forcing a GC ". GC stand= s >>>>> for garbage collection. For some more background see: >>>>> http://en.wikipedia.org/wiki/Garbage_collection_(computer_science) >>>>> > Cheers! >>>>> > >>>>> > 2012/1/25 >>>>> > >>>>> > Karl, >>>>> > >>>>> > Can you give a little more details on these 2 lines, what do they d= o? >>>>> > >>>>> > java -jar cmdline-jmxclient-0.10.3.jar - localhost:8080 >>>>> > java.lang:type=3DMemory gc >>>>> > >>>>> > Thank you, >>>>> > Mike >>>>> > >>>>> > -----Original Message----- >>>>> > From: Karl Hiramoto [mailto:karl@hiramoto.org] >>>>> > Sent: Wednesday, January 25, 2012 12:26 PM >>>>> > To: user@cassandra.apache.org >>>>> > Subject: Re: Restart cassandra every X days? >>>>> > >>>>> > >>>>> > On 01/25/12 19:18, R. Verlangen wrote: >>>>> >> Ok thank you for your feedback. I'll add these tasks to our daily >>>>> >> cassandra maintenance cronjob. Hopefully this will keep things und= er >>>>> >> controll. >>>>> > >>>>> > I forgot to mention that we found that Forcing a GC also cleans up >>>>> some >>>>> > space. >>>>> > >>>>> > >>>>> > in a cronjob you can do this with >>>>> > http://crawler.archive.org/cmdline-jmxclient/ >>>>> > >>>>> > >>>>> > my cron >>>>> >>>> >>>> >>>> >>> >>> >>> >> >> > > --f46d042f911a4e202a04b80a6962 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Well, it seems it's balancing itself, 24 hours later the ring looks lik= e this:

***.89 =A0 =A0datacenter1 rack1 =A0 =A0 =A0= Up =A0 =A0 Normal =A07.36 GB =A0 =A0 =A0 =A0 50.00% =A00
***.135= =A0 =A0datacenter1 rack1 =A0 =A0 =A0 Up =A0 =A0 Normal =A08.84 GB =A0 =A0 = =A0 =A0 50.00% =A085070591730234615865843651857942052864

Looks pretty normal, right?

2012/2/2 aaron morton <aaron@thelastpickle.com>
Speaking technically, that ain't ri= ght.

I would:
* Check if node .135 is holding = a lot of hints.=A0
* Take a look on disk and see what is there.
* Go through a repair and compact on each node. =A0 =A0


Cheers

<= div style=3D"word-wrap:break-word">
-----------------
Aaron Morton
Freelance Deve= loper
@aaronmorton

On 2/02/2012, at 9:55 PM, R. Ver= langen wrote:

Yes, I already did a repai= r and cleanup. Currently my ring looks like this:

Address =A0 =A0 =A0 =A0 DC =A0 =A0 =A0 =A0 =A0Rack =A0 =A0 =A0 =A0Status St= ate =A0 Load =A0 =A0 =A0 =A0 =A0 =A0Owns =A0 =A0Token
***.89 =A0 = =A0datacenter1 rack1 =A0 =A0 =A0 Up =A0 =A0 Normal =A02.44 GB =A0 =A0 =A0 = =A0 50.00% =A00
***.135 =A0 =A0datacenter1 rack1 =A0 =A0 =A0 Up =A0 =A0 Normal =A06.99= GB =A0 =A0 =A0 =A0 50.00% =A085070591730234615865843651857942052864
<= div>
It's not really a problem, but I'm still wonderi= ng why this happens.

2012/2/1 aaron morton <= aaron@thelastp= ickle.com>
Do you mean the load in nodetool ring i= s not even, despite the tokens been evenly distributed ?=A0

<= div>I would assume this is not the case given the difference, but it may be= hints given you have just done an upgrade. Check the system using nodetool= cfstats to see. They will eventually be delivered and deleted.=A0

More likely you will want to:
1) nodetool rep= air to make sure all data is distributed then
2) nodetool cleanup= if you have changed the tokens at any point finally

Cheers

<= div style=3D"word-wrap:break-word">
-----------------
Aaron Morton
Freelance Deve= loper
@aaronmorton

On 31/01/2012, at 11:56 PM, R. Verlangen wrot= e:

After running 3 days on Cassandra 1.0= .7 it seems the problem has been solved. One weird thing remains, on our 2 = nodes (both 50% of the ring), the first's usage is just over 25% of the= second.=A0

Anyone got an explanation for that?

2012/1/29 aaron morton <aaron@th= elastpickle.com>
Yes but=85

For every= upgrade read the NEWS.TXT it will go through the upgrade procedure in deta= il. If you want to feel extra smart scan through the CHANGES.txt to get an = idea of whats going on.=A0

Cheers

<= div style=3D"word-wrap:break-word">
-----------------
Aaron Morton
Freelance Deve= loper
@aaronmorton

On 29/01/2012, at 4:14 AM, Maxim Potekhin wro= te:

=20 =20 =20
Sorry if this has been covered, I was concentrating solely on 0.8x --
can I just d/l 1.0.x and continue using same data on same cluster?

Maxim


On 1/28/2012 7:53 AM, R. Verlangen wrote:
Ok, seems that it's clear what I should d= o next ;-)

2012/1/28 aaron morton &= lt;aaron@thela= stpickle.com>
There are no blockers to upgrading to 1.0.X.

A=A0
-----------------
Aaron Morton
Freelance Developer
@aaronmorton

On 28/01/2012, at 7:48 AM, R. Verlangen wrote:

Ok. Seems that an upgrade might fix these problems. Is Cassandra 1.x.x stable enough to upgrade for, or should we wait for a couple of weeks?

2012/1/27 Edward Capriolo <edlinuxguru@gmail.com>
I would not say that issuing restart after x days is a good idea. You are mostly developing a superstition. You should find the source of the problem. It could be jmx or thrift clients not closing connections. We don't restart nodes on a regiment they work fine.


On Thursday, January 26, 2012, Mike Panchenko <m@mihasya.com> wrote:
> There are two relevant bugs (that I know of), both resolved in somewhat recent versions, which make somewhat regular restarts beneficial
> https://issues.apache.org/ji= ra/browse/CASSANDRA-2868 (memory leak in GCInspector, fixed in 0.7.9/0.8.5)
> https://issues.apache.org/ji= ra/browse/CASSANDRA-2252 (heap fragmentation due to the way memtables used to be allocated, refactored in 1.0.0)
> Restarting daily is probably too frequent for either one of those problems. We usually notice degraded performance in our ancient cluster after ~2 weeks w/o a restart.
> As Aaron mentioned, if you have plenty of disk space, there's no reason to worry about "cruft" sstables. The size of your active set is what matters, and you can determine if that's getting too big by watching for iowait (due to reads from the data partition) and/or paging activity of the java process. When you hit that problem, the solution is to 1. try to tune your caches and 2. add more nodes to spread the load. I'll reiterate - looking at raw disk space usage should not be your guide for that.
> "Forcing" a gc generally w= orks, but should not be relied upon (note "suggest" in=A0http://docs.oracle.com/javase/6/docs/api/java/lang/System.html#= gc()). It's great news that 1.0 uses a bette= r mechanism for releasing unused sstables.
> nodetool compact triggers a "major" compaction and is no lo= nger a recommended by datastax (details here=A0<= a href=3D"http://www.datastax.com/docs/1.0/operations/tuning#tuning-compact= ion" target=3D"_blank">http://www.datastax.com/docs/1.0/operations/tuning#t= uning-compaction bottom of the page).
> Hope this helps.
> Mike.
> On Wed, Jan 25, 2012 at 5:14 PM, aaron morton <aaron@thelastpickle.com> wrote:
>
> That disk usage pattern is to be expected in pre 1.0 versions. Disk usage is far less interesting than disk free space, if it's using 60 GB and there is 200GB thats ok. If it's using 60Gb and there is 6MB free thats a problem.
> In pre 1.0 the compacted files are deleted on disk by waiting for the JVM do decide to GC all remaining references. If there is not enough space (to store the total size of the files it is about to write or compact) on disk GC is forced and the files are deleted. Otherwise they will get deleted at some point in the future.=A0 > In 1.0 files are reference counted and space is freed much sooner.=A0
> With regard to regular maintenance, node tool cleanup remvos data from a node that it is no longer a replica for. This is only of use when you have done a token move.=A0
> I would not recommend a daily restart of the cassandra process. You will lose all the run time optimizations the JVM has made (i think the mapped files pages will stay resident). As well as adding additional entropy to the system which must be repaired via HH, RR or nodetool repair.=A0
> If you want to see compacted files purged faster the best approach would be to upgrade to 1.0.=A0
> Hope that helps.=A0
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> On 26/01/2012, at 9:51 AM, R. Verlangen wrote:
>
> In his message he explains that it's for " Forcing a GC=A0"= . GC stands for garbage collection. For some more background see:=A0 http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)= =A0
> Cheers!
>
> 2012/1/25 <mike.li@thomsonreuters.com>= ;
>
> Karl,
>
> Can you give a little more details on these 2 lines, what do they do?
>
> java -jar cmdline-jmxclient-0.10.3.jar - localhost:8080
> java.lang:type=3DMemory gc
>
> Thank you,
> Mike
>
> -----Original Message-----
> From: Karl Hiramoto [mailto:karl@hiramoto.org]
> Sent: Wednesday, January 25, 2012 12:26 PM
> To: user@cassandra.apache.org
> Subject: Re: Restart cassandra every X days?
>
>
> On 01/25/12 19:18, R. Verlangen wrote:
>> Ok thank you for your feedback. I'll add these tasks to our daily
>> cassandra maintenance cronjob. Hopefully this will keep things under
>> controll.
>
> I forgot to mention that we found that Forcing a GC also cleans up some
> space.
>
>
> in a cronjob you can do this with > http://crawler.archive.org/cmdline-= jmxclient/
>
>
> my cron










--f46d042f911a4e202a04b80a6962--