cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brandon Williams (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (CASSANDRA-2868) Native Memory Leak
Date Tue, 09 Aug 2011 18:45:28 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081834#comment-13081834
] 

Brandon Williams edited comment on CASSANDRA-2868 at 8/9/11 6:43 PM:
---------------------------------------------------------------------

bq. Wouldn't it be worth indicating that how many collection have been done since last log
message if it's > 1, since it can (be > 1).

The only reason I added count tracking was to prevent it from firing when there were no GCs
(the api is flakey.)  I've never actually been able to get > 1 to happen, but we can add
it to the logging.

bq. IMO the duration-based thresholds are hard to reason about here, where we're dealing w/
summaries and not individual GC results.

We are dealing with individual GCs at least 99% of the time in practice.  The worst case is
>1 GC inflates the gctime enough that we errantly log when it's not needed, but I imagine
to trigger that you would have to be in a gc pressure situation already.

bq. I think I'd rather have something like the dropped messages logger, where every N seconds
we log the summary we get from the mbean.

That seems like it could be a lot of noise since GC is constantly happening.

bq. The flushLargestMemtables/reduceCacheSizes stuff should probably be removed. 

I think the logic there is still sound ("Did we just do a CMS? Is the heap still 80% full?")
and it seems to work as well as it always has.



      was (Author: brandon.williams):
    bq. Wouldn't it be worth indicating that how many collection have been done since last
log message if it's > 1, since it can (be > 1).

The only reason I added count tracking was to prevent it from firing when there were no GCs
(the api is flakey.)  I've never actually been able to get > 1 to happen, but we can add
it to the logging.

bq. IMO the duration-based thresholds are hard to reason about here, where we're dealing w/
summaries and not individual GC results.

We are dealing with individual GCs at least 99% of the time in practice.  The worst case is
>1 GC inflates the gctime enough that we errantly log when it's not needed, but I imagine
to trigger that you would have to be in a gc pressure situation already.

bq. I think I'd rather have something like the dropped messages logger, where every N seconds
we log the summary we get from the mbean.

That seems like it could a lot of noise since GC is constantly happening.

bq. The flushLargestMemtables/reduceCacheSizes stuff should probably be removed. 

I think the logic there is still sound ("Did we just do a CMS? Is the heap still 80% full?")
and it seems to work as well as it always has.


  
> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Daniel Doubleday
>            Assignee: Brandon Williams
>            Priority: Minor
>             Fix For: 0.8.4
>
>         Attachments: 2868-v1.txt, 2868-v2.txt, 48hour_RES.png, low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by several
users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed
by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to
mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message