incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "R. Verlangen" <ro...@us2.nl>
Subject Re: Restart cassandra every X days?
Date Fri, 03 Feb 2012 07:39:22 GMT
Well, it seems it's balancing itself, 24 hours later the ring looks like
this:

***.89    datacenter1 rack1       Up     Normal  7.36 GB         50.00%  0
***.135    datacenter1 rack1       Up     Normal  8.84 GB         50.00%
 85070591730234615865843651857942052864

Looks pretty normal, right?

2012/2/2 aaron morton <aaron@thelastpickle.com>

> Speaking technically, that ain't right.
>
> I would:
> * Check if node .135 is holding a lot of hints.
> * Take a look on disk and see what is there.
> * Go through a repair and compact on each node.
>
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 2/02/2012, at 9:55 PM, R. Verlangen wrote:
>
> Yes, I already did a repair and cleanup. Currently my ring looks like this:
>
> Address         DC          Rack        Status State   Load
>  Owns    Token
> ***.89    datacenter1 rack1       Up     Normal  2.44 GB         50.00%  0
> ***.135    datacenter1 rack1       Up     Normal  6.99 GB         50.00%
>  85070591730234615865843651857942052864
>
> It's not really a problem, but I'm still wondering why this happens.
>
> 2012/2/1 aaron morton <aaron@thelastpickle.com>
>
>> Do you mean the load in nodetool ring is not even, despite the tokens
>> been evenly distributed ?
>>
>> I would assume this is not the case given the difference, but it may be
>> hints given you have just done an upgrade. Check the system using nodetool
>> cfstats to see. They will eventually be delivered and deleted.
>>
>> More likely you will want to:
>> 1) nodetool repair to make sure all data is distributed then
>> 2) nodetool cleanup if you have changed the tokens at any point finally
>>
>> Cheers
>>
>>   -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 31/01/2012, at 11:56 PM, R. Verlangen wrote:
>>
>> After running 3 days on Cassandra 1.0.7 it seems the problem has been
>> solved. One weird thing remains, on our 2 nodes (both 50% of the ring), the
>> first's usage is just over 25% of the second.
>>
>> Anyone got an explanation for that?
>>
>> 2012/1/29 aaron morton <aaron@thelastpickle.com>
>>
>>> Yes but…
>>>
>>> For every upgrade read the NEWS.TXT it will go through the upgrade
>>> procedure in detail. If you want to feel extra smart scan through the
>>> CHANGES.txt to get an idea of whats going on.
>>>
>>> Cheers
>>>
>>>   -----------------
>>> Aaron Morton
>>> Freelance Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>> On 29/01/2012, at 4:14 AM, Maxim Potekhin wrote:
>>>
>>>  Sorry if this has been covered, I was concentrating solely on 0.8x --
>>> can I just d/l 1.0.x and continue using same data on same cluster?
>>>
>>> Maxim
>>>
>>>
>>> On 1/28/2012 7:53 AM, R. Verlangen wrote:
>>>
>>> Ok, seems that it's clear what I should do next ;-)
>>>
>>> 2012/1/28 aaron morton <aaron@thelastpickle.com>
>>>
>>>> There are no blockers to upgrading to 1.0.X.
>>>>
>>>>  A
>>>>      -----------------
>>>> Aaron Morton
>>>> Freelance Developer
>>>> @aaronmorton
>>>> http://www.thelastpickle.com
>>>>
>>>>   On 28/01/2012, at 7:48 AM, R. Verlangen wrote:
>>>>
>>>> Ok. Seems that an upgrade might fix these problems. Is Cassandra 1.x.x
>>>> stable enough to upgrade for, or should we wait for a couple of weeks?
>>>>
>>>> 2012/1/27 Edward Capriolo <edlinuxguru@gmail.com>
>>>>
>>>>> I would not say that issuing restart after x days is a good idea. You
>>>>> are mostly developing a superstition. You should find the source of the
>>>>> problem. It could be jmx or thrift clients not closing connections. We
>>>>> don't restart nodes on a regiment they work fine.
>>>>>
>>>>>
>>>>> On Thursday, January 26, 2012, Mike Panchenko <m@mihasya.com> wrote:
>>>>> > There are two relevant bugs (that I know of), both resolved in
>>>>> somewhat recent versions, which make somewhat regular restarts beneficial
>>>>> > https://issues.apache.org/jira/browse/CASSANDRA-2868 (memory leak
>>>>> in GCInspector, fixed in 0.7.9/0.8.5)
>>>>> > https://issues.apache.org/jira/browse/CASSANDRA-2252 (heap
>>>>> fragmentation due to the way memtables used to be allocated, refactored
in
>>>>> 1.0.0)
>>>>> > Restarting daily is probably too frequent for either one of those
>>>>> problems. We usually notice degraded performance in our ancient cluster
>>>>> after ~2 weeks w/o a restart.
>>>>> > As Aaron mentioned, if you have plenty of disk space, there's no
>>>>> reason to worry about "cruft" sstables. The size of your active set is
what
>>>>> matters, and you can determine if that's getting too big by watching
for
>>>>> iowait (due to reads from the data partition) and/or paging activity
of the
>>>>> java process. When you hit that problem, the solution is to 1. try to
tune
>>>>> your caches and 2. add more nodes to spread the load. I'll reiterate
-
>>>>> looking at raw disk space usage should not be your guide for that.
>>>>> > "Forcing" a gc generally works, but should not be relied upon (note
>>>>> "suggest" in
>>>>> http://docs.oracle.com/javase/6/docs/api/java/lang/System.html#gc()).
>>>>> It's great news that 1.0 uses a better mechanism for releasing unused
>>>>> sstables.
>>>>> > nodetool compact triggers a "major" compaction and is no longer
a
>>>>> recommended by datastax (details here
>>>>> http://www.datastax.com/docs/1.0/operations/tuning#tuning-compactionbottom
of the page).
>>>>> > Hope this helps.
>>>>> > Mike.
>>>>> > On Wed, Jan 25, 2012 at 5:14 PM, aaron morton <
>>>>> aaron@thelastpickle.com> wrote:
>>>>> >
>>>>> > That disk usage pattern is to be expected in pre 1.0 versions. Disk
>>>>> usage is far less interesting than disk free space, if it's using 60
GB and
>>>>> there is 200GB thats ok. If it's using 60Gb and there is 6MB free thats
a
>>>>> problem.
>>>>> > In pre 1.0 the compacted files are deleted on disk by waiting for
>>>>> the JVM do decide to GC all remaining references. If there is not enough
>>>>> space (to store the total size of the files it is about to write or
>>>>> compact) on disk GC is forced and the files are deleted. Otherwise they
>>>>> will get deleted at some point in the future.
>>>>> > In 1.0 files are reference counted and space is freed much sooner.
>>>>> > With regard to regular maintenance, node tool cleanup remvos data
>>>>> from a node that it is no longer a replica for. This is only of use when
>>>>> you have done a token move.
>>>>> > I would not recommend a daily restart of the cassandra process.
You
>>>>> will lose all the run time optimizations the JVM has made (i think the
>>>>> mapped files pages will stay resident). As well as adding additional
>>>>> entropy to the system which must be repaired via HH, RR or nodetool repair.
>>>>> > If you want to see compacted files purged faster the best approach
>>>>> would be to upgrade to 1.0.
>>>>> > Hope that helps.
>>>>> > -----------------
>>>>> > Aaron Morton
>>>>> > Freelance Developer
>>>>> > @aaronmorton
>>>>> > http://www.thelastpickle.com
>>>>> > On 26/01/2012, at 9:51 AM, R. Verlangen wrote:
>>>>> >
>>>>> > In his message he explains that it's for " Forcing a GC ". GC stands
>>>>> for garbage collection. For some more background see:
>>>>> http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)
>>>>> > Cheers!
>>>>> >
>>>>> > 2012/1/25 <mike.li@thomsonreuters.com>
>>>>> >
>>>>> > Karl,
>>>>> >
>>>>> > Can you give a little more details on these 2 lines, what do they
do?
>>>>> >
>>>>> > java -jar cmdline-jmxclient-0.10.3.jar - localhost:8080
>>>>> > java.lang:type=Memory gc
>>>>> >
>>>>> > Thank you,
>>>>> > Mike
>>>>> >
>>>>> > -----Original Message-----
>>>>> > From: Karl Hiramoto [mailto:karl@hiramoto.org]
>>>>> > Sent: Wednesday, January 25, 2012 12:26 PM
>>>>> > To: user@cassandra.apache.org
>>>>> > Subject: Re: Restart cassandra every X days?
>>>>> >
>>>>> >
>>>>> > On 01/25/12 19:18, R. Verlangen wrote:
>>>>> >> Ok thank you for your feedback. I'll add these tasks to our
daily
>>>>> >> cassandra maintenance cronjob. Hopefully this will keep things
under
>>>>> >> controll.
>>>>> >
>>>>> > I forgot to mention that we found that Forcing a GC also cleans
up
>>>>> some
>>>>> > space.
>>>>> >
>>>>> >
>>>>> > in a cronjob you can do this with
>>>>> > http://crawler.archive.org/cmdline-jmxclient/
>>>>> >
>>>>> >
>>>>> > my cron
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>
>

Mime
View raw message