cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "graham sanderson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6275) 2.0.x leaks file handles
Date Wed, 20 Nov 2013 00:27:23 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827173#comment-13827173
] 

graham sanderson commented on CASSANDRA-6275:
---------------------------------------------

Note that this would tend to imply that I was wrong (at least about the particular code path),
and the change in leak rate may be attributable to less throughput without the file cache.
Note the leak rate does seem quite related to how hard we are hitting the server as mentioned
before, so a threading bug elsewhere might be the cause.

Note nominally buffer in RAR should be volatile, but then any code path thru close where buffer's
latest value is stale would end up calling deallocate anyway (at least in the case that file_cache_size_in_mb
is off; I didn't think though the other case.

So given the finalizer fix - which we can try and build here to test out (unless someone has
it pre-built) - seems to imply that it is just someone failing to call close() under load
conditions.

> 2.0.x leaks file handles
> ------------------------
>
>                 Key: CASSANDRA-6275
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6275
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: java version "1.7.0_25"
> Java(TM) SE Runtime Environment (build 1.7.0_25-b15)
> Java HotSpot(TM) 64-Bit Server VM (build 23.25-b01, mixed mode)
> Linux cassandra-test1 2.6.32-279.el6.x86_64 #1 SMP Thu Jun 21 15:00:18 EDT 2012 x86_64
x86_64 x86_64 GNU/Linux
>            Reporter: Mikhail Mazursky
>            Assignee: Michael Shuler
>         Attachments: c_file-descriptors_strace.tbz, cassandra_jstack.txt, leak.log, position_hints.tgz,
slog.gz
>
>
> Looks like C* is leaking file descriptors when doing lots of CAS operations.
> {noformat}
> $ sudo cat /proc/15455/limits
> Limit                     Soft Limit           Hard Limit           Units    
> Max cpu time              unlimited            unlimited            seconds  
> Max file size             unlimited            unlimited            bytes    
> Max data size             unlimited            unlimited            bytes    
> Max stack size            10485760             unlimited            bytes    
> Max core file size        0                    0                    bytes    
> Max resident set          unlimited            unlimited            bytes    
> Max processes             1024                 unlimited            processes
> Max open files            4096                 4096                 files    
> Max locked memory         unlimited            unlimited            bytes    
> Max address space         unlimited            unlimited            bytes    
> Max file locks            unlimited            unlimited            locks    
> Max pending signals       14633                14633                signals  
> Max msgqueue size         819200               819200               bytes    
> Max nice priority         0                    0                   
> Max realtime priority     0                    0                   
> Max realtime timeout      unlimited            unlimited            us 
> {noformat}
> Looks like the problem is not in limits.
> Before load test:
> {noformat}
> cassandra-test0 ~]$ lsof -n | grep java | wc -l
> 166
> cassandra-test1 ~]$ lsof -n | grep java | wc -l
> 164
> cassandra-test2 ~]$ lsof -n | grep java | wc -l
> 180
> {noformat}
> After load test:
> {noformat}
> cassandra-test0 ~]$ lsof -n | grep java | wc -l
> 967
> cassandra-test1 ~]$ lsof -n | grep java | wc -l
> 1766
> cassandra-test2 ~]$ lsof -n | grep java | wc -l
> 2578
> {noformat}
> Most opened files have names like:
> {noformat}
> java      16890 cassandra 1636r      REG             202,17  88724987     655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java      16890 cassandra 1637r      REG             202,17 161158485     655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java      16890 cassandra 1638r      REG             202,17  88724987     655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java      16890 cassandra 1639r      REG             202,17 161158485     655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java      16890 cassandra 1640r      REG             202,17  88724987     655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java      16890 cassandra 1641r      REG             202,17 161158485     655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java      16890 cassandra 1642r      REG             202,17  88724987     655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java      16890 cassandra 1643r      REG             202,17 161158485     655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java      16890 cassandra 1644r      REG             202,17  88724987     655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java      16890 cassandra 1645r      REG             202,17 161158485     655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java      16890 cassandra 1646r      REG             202,17  88724987     655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java      16890 cassandra 1647r      REG             202,17 161158485     655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java      16890 cassandra 1648r      REG             202,17  88724987     655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java      16890 cassandra 1649r      REG             202,17 161158485     655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java      16890 cassandra 1650r      REG             202,17  88724987     655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java      16890 cassandra 1651r      REG             202,17 161158485     655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java      16890 cassandra 1652r      REG             202,17  88724987     655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java      16890 cassandra 1653r      REG             202,17 161158485     655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java      16890 cassandra 1654r      REG             202,17  88724987     655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java      16890 cassandra 1655r      REG             202,17 161158485     655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java      16890 cassandra 1656r      REG             202,17  88724987     655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> {noformat}
> Also, when that happens it's not always possible to shutdown server process via SIGTERM.
Have to use SIGKILL.
> p.s. See mailing thread for more context information https://www.mail-archive.com/user@cassandra.apache.org/msg33035.html



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message