cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Zhou (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-13049) Too many open files during bootstrapping
Date Fri, 12 May 2017 22:18:04 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008810#comment-16008810
] 

Simon Zhou edited comment on CASSANDRA-13049 at 5/12/17 10:17 PM:
------------------------------------------------------------------

I wrote some [micro benchmark code | https://github.com/szhou1234/jmh-samples/blob/master/src/main/java/com/cassandra/MmapPerf.java].
To my surprise memory mapping is very efficient even on small files. Here is the result on
an idle server. For each file size (1k, 10k, 100k, 1m, 10m), there are 2000 files. I should
have disabled page cache but this result is for the first time I ran the test on that server.
Having said that, we can stick with mmap for efficient IO while seeking for configuration
tuning to reduce the number of sstables being streamed.

Benchmark             (bufferSize)             (filePath)  (useDirectBuffer)  Mode  Cnt  
Score   Error  Units
MmapPerf.readChannel         65536    /home/szhou/1kfiles              false  avgt    4  
0.044 ± 0.051   s/op
MmapPerf.readChannel         65536    /home/szhou/1kfiles               true  avgt    4  
0.064 ± 0.015   s/op
MmapPerf.readChannel         65536   /home/szhou/10kfiles              false  avgt    4  
0.050 ± 0.060   s/op
MmapPerf.readChannel         65536   /home/szhou/10kfiles               true  avgt    4  
0.072 ± 0.019   s/op
MmapPerf.readChannel         65536  /home/szhou/100kfiles              false  avgt    4  
0.143 ± 0.060   s/op
MmapPerf.readChannel         65536  /home/szhou/100kfiles               true  avgt    4  
0.166 ± 0.021   s/op
MmapPerf.readChannel         65536    /home/szhou/1mfiles              false  avgt    4  
1.051 ± 0.801   s/op
MmapPerf.readChannel         65536    /home/szhou/1mfiles               true  avgt    4  
1.287 ± 0.220   s/op
MmapPerf.readChannel         65536   /home/szhou/10mfiles              false  avgt    4  
9.696 ± 2.207   s/op
MmapPerf.readChannel         65536   /home/szhou/10mfiles               true  avgt    4  13.754
± 1.379   s/op
MmapPerf.readMapping         65536    /home/szhou/1kfiles              false  avgt    4  
0.017 ± 0.007   s/op
MmapPerf.readMapping         65536    /home/szhou/1kfiles               true  avgt    4  
0.017 ± 0.005   s/op
MmapPerf.readMapping         65536   /home/szhou/10kfiles              false  avgt    4  
0.016 ± 0.004   s/op
MmapPerf.readMapping         65536   /home/szhou/10kfiles               true  avgt    4  
0.017 ± 0.006   s/op
MmapPerf.readMapping         65536  /home/szhou/100kfiles              false  avgt    4  
0.023 ± 0.004   s/op
MmapPerf.readMapping         65536  /home/szhou/100kfiles               true  avgt    4  
0.026 ± 0.006   s/op
MmapPerf.readMapping         65536    /home/szhou/1mfiles              false  avgt    4  
0.129 ± 0.017   s/op
MmapPerf.readMapping         65536    /home/szhou/1mfiles               true  avgt    4  
0.132 ± 0.068   s/op
MmapPerf.readMapping         65536   /home/szhou/10mfiles              false  avgt    4  
1.313 ± 0.262   s/op
MmapPerf.readMapping         65536   /home/szhou/10mfiles               true  avgt    4  
1.274 ± 0.482   s/op


was (Author: szhou):
I wrote some micro [benchmark code | https://github.com/szhou1234/jmh-samples/blob/master/src/main/java/com/cassandra/MmapPerf.java].
To my surprise memory mapping is very efficient even on small files. Here is the result on
an idle server. For each file size (1k, 10k, 100k, 1m, 10m), there are 2000 files. I should
have disabled page cache but this result is for the first time I ran the test on that server.
Having said that, we can stick with mmap for efficient IO while seeking for configuration
tuning to reduce the number of sstables being streamed.

Benchmark             (bufferSize)             (filePath)  (useDirectBuffer)  Mode  Cnt  
Score   Error  Units
MmapPerf.readChannel         65536    /home/szhou/1kfiles              false  avgt    4  
0.044 ± 0.051   s/op
MmapPerf.readChannel         65536    /home/szhou/1kfiles               true  avgt    4  
0.064 ± 0.015   s/op
MmapPerf.readChannel         65536   /home/szhou/10kfiles              false  avgt    4  
0.050 ± 0.060   s/op
MmapPerf.readChannel         65536   /home/szhou/10kfiles               true  avgt    4  
0.072 ± 0.019   s/op
MmapPerf.readChannel         65536  /home/szhou/100kfiles              false  avgt    4  
0.143 ± 0.060   s/op
MmapPerf.readChannel         65536  /home/szhou/100kfiles               true  avgt    4  
0.166 ± 0.021   s/op
MmapPerf.readChannel         65536    /home/szhou/1mfiles              false  avgt    4  
1.051 ± 0.801   s/op
MmapPerf.readChannel         65536    /home/szhou/1mfiles               true  avgt    4  
1.287 ± 0.220   s/op
MmapPerf.readChannel         65536   /home/szhou/10mfiles              false  avgt    4  
9.696 ± 2.207   s/op
MmapPerf.readChannel         65536   /home/szhou/10mfiles               true  avgt    4  13.754
± 1.379   s/op
MmapPerf.readMapping         65536    /home/szhou/1kfiles              false  avgt    4  
0.017 ± 0.007   s/op
MmapPerf.readMapping         65536    /home/szhou/1kfiles               true  avgt    4  
0.017 ± 0.005   s/op
MmapPerf.readMapping         65536   /home/szhou/10kfiles              false  avgt    4  
0.016 ± 0.004   s/op
MmapPerf.readMapping         65536   /home/szhou/10kfiles               true  avgt    4  
0.017 ± 0.006   s/op
MmapPerf.readMapping         65536  /home/szhou/100kfiles              false  avgt    4  
0.023 ± 0.004   s/op
MmapPerf.readMapping         65536  /home/szhou/100kfiles               true  avgt    4  
0.026 ± 0.006   s/op
MmapPerf.readMapping         65536    /home/szhou/1mfiles              false  avgt    4  
0.129 ± 0.017   s/op
MmapPerf.readMapping         65536    /home/szhou/1mfiles               true  avgt    4  
0.132 ± 0.068   s/op
MmapPerf.readMapping         65536   /home/szhou/10mfiles              false  avgt    4  
1.313 ± 0.262   s/op
MmapPerf.readMapping         65536   /home/szhou/10mfiles               true  avgt    4  
1.274 ± 0.482   s/op

> Too many open files during bootstrapping
> ----------------------------------------
>
>                 Key: CASSANDRA-13049
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13049
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Simon Zhou
>            Assignee: Simon Zhou
>
> We just upgraded from 2.2.5 to 3.0.10 and got issue during bootstrapping. So likely this
is something made worse along with improving IO performance in Cassandra 3.
> On our side, the issue is that we have lots of small sstables and thus when bootstrapping
a new node, it receives lots of files during streaming and Cassandra keeps all of them open
for an unpredictable amount of time. Eventually we hit "Too many open files" error and around
that time, I can see ~1M open files through lsof and almost all of them are *-Data.db and
*-Index.db. Definitely we should use a better compaction strategy to reduce the number of
sstables but I see a few possible improvements in Cassandra:
> 1. We use memory map when reading data from sstables. Every time we create a new memory
map, there is one more file descriptor open. Memory map improves IO performance when dealing
with large files, do we want to set a file size threshold when doing this?
> 2. Whenever we finished receiving a file from peer, we create a SSTableReader/BigTableReader,
which includes opening the data file and index file, and keep them open until some time later
(unpredictable). See StreamReceiveTask#L110, BigTableWriter#openFinal and SSTableReader#InstanceTidier.
Is it better to lazily open the data/index files or close them more often to reclaim the file
descriptors?
> I searched all known issue in JIRA and looks like this is a new issue in Cassandra 3.
cc [~Stefania] for comments.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message