cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "graham sanderson (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-6275) 2.0.x leaks file handles
Date Mon, 18 Nov 2013 22:55:22 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13825892#comment-13825892
] 

graham sanderson edited comment on CASSANDRA-6275 at 11/18/13 10:53 PM:
------------------------------------------------------------------------

Note also, that most if not all of the deleted files are of the form

{code}
java    14018 cassandra  586r   REG               8,33   8792499       1251 /data/1/cassandra/OpsCenter/rollups60/OpsCenter-rollups60-jb-4656-Data.db
(deleted)
java    14018 cassandra  587r   REG               8,33  27303760       1254 /data/1/cassandra/OpsCenter/rollups60/OpsCenter-rollups60-jb-4655-Data.db
(deleted)
java    14018 cassandra  588r   REG               8,33   8792499       1251 /data/1/cassandra/OpsCenter/rollups60/OpsCenter-rollups60-jb-4656-Data.db
(deleted)
java    14018 cassandra  589r   REG               8,33  27303760       1254 /data/1/cassandra/OpsCenter/rollups60/OpsCenter-rollups60-jb-4655-Data.db
(deleted)
java    14018 cassandra  590r   REG               8,33  10507214        936 /data/1/cassandra/OpsCenter/rollups60/OpsCenter-rollups60-jb-4657-Data.db
(deleted)
{code}
We have 7 data disks per node (don't know if this contributes to the problem), and the number
of such (open but) deleted files is very ill balanced with 93% on two of the 7 disks (on this
particular node)... the distribution of live data file size for OpsCenter/rollups60 is a little
uneven with the same data mounts that have more deleted files having more actual live data,
but the deleted file counts per mount point vary by several order of magnitudes whereas the
data size itself does not.


was (Author: graham sanderson):
Note also, that most if not all of the deleted files are of the form

{code}
java    14018 cassandra  586r   REG               8,33   8792499       1251 /data/1/cassandra/OpsCenter/rollups60/OpsCenter-rollups60-jb-4656-Data.db
(deleted)
java    14018 cassandra  587r   REG               8,33  27303760       1254 /data/1/cassandra/OpsCenter/rollups60/OpsCenter-rollups60-jb-4655-Data.db
(deleted)
java    14018 cassandra  588r   REG               8,33   8792499       1251 /data/1/cassandra/OpsCenter/rollups60/OpsCenter-rollups60-jb-4656-Data.db
(deleted)
java    14018 cassandra  589r   REG               8,33  27303760       1254 /data/1/cassandra/OpsCenter/rollups60/OpsCenter-rollups60-jb-4655-Data.db
(deleted)
java    14018 cassandra  590r   REG               8,33  10507214        936 /data/1/cassandra/OpsCenter/rollups60/OpsCenter-rollups60-jb-4657-Data.db
(deleted)
{code}
We have 7 data disks (don't know if this contributes to the problem), and the number of such
deleted files is very ill balanced with 93% on two of the 7 disks (on this particular node)...
the distribution of live data file size for OpsCenter/rollups60 is a little uneven with the
same data mounts that have more deleted (but open) files having more actual live data, but
the deleted file counts per mount point vary by several order of magnitudes whereas the data
itself does not.

> 2.0.x leaks file handles
> ------------------------
>
>                 Key: CASSANDRA-6275
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6275
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: java version "1.7.0_25"
> Java(TM) SE Runtime Environment (build 1.7.0_25-b15)
> Java HotSpot(TM) 64-Bit Server VM (build 23.25-b01, mixed mode)
> Linux cassandra-test1 2.6.32-279.el6.x86_64 #1 SMP Thu Jun 21 15:00:18 EDT 2012 x86_64
x86_64 x86_64 GNU/Linux
>            Reporter: Mikhail Mazursky
>         Attachments: c_file-descriptors_strace.tbz, cassandra_jstack.txt, leak.log, position_hints.tgz,
slog.gz
>
>
> Looks like C* is leaking file descriptors when doing lots of CAS operations.
> {noformat}
> $ sudo cat /proc/15455/limits
> Limit                     Soft Limit           Hard Limit           Units    
> Max cpu time              unlimited            unlimited            seconds  
> Max file size             unlimited            unlimited            bytes    
> Max data size             unlimited            unlimited            bytes    
> Max stack size            10485760             unlimited            bytes    
> Max core file size        0                    0                    bytes    
> Max resident set          unlimited            unlimited            bytes    
> Max processes             1024                 unlimited            processes
> Max open files            4096                 4096                 files    
> Max locked memory         unlimited            unlimited            bytes    
> Max address space         unlimited            unlimited            bytes    
> Max file locks            unlimited            unlimited            locks    
> Max pending signals       14633                14633                signals  
> Max msgqueue size         819200               819200               bytes    
> Max nice priority         0                    0                   
> Max realtime priority     0                    0                   
> Max realtime timeout      unlimited            unlimited            us 
> {noformat}
> Looks like the problem is not in limits.
> Before load test:
> {noformat}
> cassandra-test0 ~]$ lsof -n | grep java | wc -l
> 166
> cassandra-test1 ~]$ lsof -n | grep java | wc -l
> 164
> cassandra-test2 ~]$ lsof -n | grep java | wc -l
> 180
> {noformat}
> After load test:
> {noformat}
> cassandra-test0 ~]$ lsof -n | grep java | wc -l
> 967
> cassandra-test1 ~]$ lsof -n | grep java | wc -l
> 1766
> cassandra-test2 ~]$ lsof -n | grep java | wc -l
> 2578
> {noformat}
> Most opened files have names like:
> {noformat}
> java      16890 cassandra 1636r      REG             202,17  88724987     655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java      16890 cassandra 1637r      REG             202,17 161158485     655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java      16890 cassandra 1638r      REG             202,17  88724987     655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java      16890 cassandra 1639r      REG             202,17 161158485     655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java      16890 cassandra 1640r      REG             202,17  88724987     655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java      16890 cassandra 1641r      REG             202,17 161158485     655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java      16890 cassandra 1642r      REG             202,17  88724987     655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java      16890 cassandra 1643r      REG             202,17 161158485     655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java      16890 cassandra 1644r      REG             202,17  88724987     655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java      16890 cassandra 1645r      REG             202,17 161158485     655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java      16890 cassandra 1646r      REG             202,17  88724987     655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java      16890 cassandra 1647r      REG             202,17 161158485     655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java      16890 cassandra 1648r      REG             202,17  88724987     655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java      16890 cassandra 1649r      REG             202,17 161158485     655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java      16890 cassandra 1650r      REG             202,17  88724987     655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java      16890 cassandra 1651r      REG             202,17 161158485     655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java      16890 cassandra 1652r      REG             202,17  88724987     655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java      16890 cassandra 1653r      REG             202,17 161158485     655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java      16890 cassandra 1654r      REG             202,17  88724987     655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java      16890 cassandra 1655r      REG             202,17 161158485     655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java      16890 cassandra 1656r      REG             202,17  88724987     655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> {noformat}
> Also, when that happens it's not always possible to shutdown server process via SIGTERM.
Have to use SIGKILL.
> p.s. See mailing thread for more context information https://www.mail-archive.com/user@cassandra.apache.org/msg33035.html



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message