incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pieter Callewaert <pieter.callewa...@be-mobile.be>
Subject RE: Too many open files (Cassandra 2.0.1)
Date Tue, 29 Oct 2013 15:30:18 GMT
Investigated a bit more:


-        I can reproduce it, happened already on several nodes when I do some stress testing
(50000 select's spread over multiple threads)

-        Unexpected exception in the selector loop. Seems not related with the Too many open
files, it just happens.

-        It's not socket related.

-        Using Oracle Java(TM) SE Runtime Environment (build 1.7.0_40-b43)

-        Using multiple data directories (maybe related ?)

I'm stuck at the moment, I don't know If I should try DEBUG log because it will be too much
information?

Kind regards,
Pieter Callewaert

[Description: cid:image003.png@01CD9CE5.CE5A2330]

   Pieter Callewaert
   Web & IT engineer

   Web:   www.be-mobile.be<http://www.be-mobile.be/>
   Email: pieter.callewaert@be-mobile.be<mailto:pieter.callewaert@be-mobile.be>
   Tel:  + 32 9 330 51 80



From: Pieter Callewaert [mailto:pieter.callewaert@be-mobile.be]
Sent: dinsdag 29 oktober 2013 13:40
To: user@cassandra.apache.org
Subject: Too many open files (Cassandra 2.0.1)

Hi,

I've noticed some nodes in our cluster are dying after some period of time.

WARN [New I/O server boss #17] 2013-10-29 12:22:20,725 Slf4JLogger.java (line 76) Failed to
accept a connection.
java.io.IOException: Too many open files
        at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
        at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:241)
        at org.jboss.netty.channel.socket.nio.NioServerBoss.process(NioServerBoss.java:100)
        at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
        at org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)

And other exceptions related to the same cause.
Now, as we use the Cassandra package, the nofile limit is raised to 100000.
To double check if this correct:

root@de-cass09 ~ # cat /proc/18332/limits
Limit                     Soft Limit           Hard Limit           Units
...
Max open files            100000               100000               files
...

Now I check how many files are open:
root@de-cass09 ~ # lsof -n -p 18332 | wc -l
100038

This seems an awful a lot for size tiered compaction... ?
Now I noticed when I checked the list, a (deleted) file passed a lot

...
java    18332 cassandra 4704r   REG                8,1  10911921661 2147483839 /data1/mapdata040/hos/mapdata040-hos-jb-7648-Data.db
(deleted)
java    18332 cassandra 4705r   REG                8,1  10911921661 2147483839 /data1/mapdata040/hos/mapdata040-hos-jb-7648-Data.db
(deleted)
...

Actually, if I count specific for this file:
root@de-cass09 ~ # lsof -n -p 18332 | grep mapdata040-hos-jb-7648-Data.db | wc -l
52707

Other nodes are around a total of 350 files open... Any idea why this nofiles is so high ?

The first exceptions I see is this:
WARN [New I/O worker #8] 2013-10-29 12:09:34,440 Slf4JLogger.java (line 76) Unexpected exception
in the selector loop.
java.lang.NullPointerException
        at sun.nio.ch.EPollArrayWrapper.setUpdateEvents(EPollArrayWrapper.java:178)
        at sun.nio.ch.EPollArrayWrapper.add(EPollArrayWrapper.java:227)
        at sun.nio.ch.EPollSelectorImpl.implRegister(EPollSelectorImpl.java:164)
        at sun.nio.ch.SelectorImpl.register(SelectorImpl.java:133)
        at java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:209)
        at org.jboss.netty.channel.socket.nio.NioWorker$RegisterTask.run(NioWorker.java:151)
        at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366)
        at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290)
        at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       at java.lang.Thread.run(Thread.java:724)

Several minutes later I get Too many open files.

Specs:
12-node cluster with Ubuntu 12.04 LTS, Cassandra 2.0.1 (datastax packages), using JBOD of
2 disks.
JNA enabled.

Any suggestions?

Kind regards,
Pieter Callewaert

[Description: cid:image003.png@01CD9CE5.CE5A2330]

   Pieter Callewaert
   Web & IT engineer

   Web:   www.be-mobile.be<http://www.be-mobile.be/>
   Email: pieter.callewaert@be-mobile.be<mailto:pieter.callewaert@be-mobile.be>
   Tel:  + 32 9 330 51 80




Mime
View raw message