From Thomas Downing <>
Subject Re: Too many open files [was Re: Minimizing the impact of compaction on latency and throughput]
Date Wed, 14 Jul 2010 11:53:03 GMT
On 7/14/2010 7:16 AM, Peter Schuller wrote:
>> More than one fd can be open on a given file, and many of open fd's are
>> on files that have been deleted.  The stale fd's are all on Data.db files in
>> the
>> data directory, which I have separate from the commit log directory.
>> I haven't had a chance to look at the code handling files, and I am not any
>> sort of Java expert, but could this be due to Java's lazy resource clean up?
>> I wonder if when considering writing your own file handling classes for
>> O_DIRECT or posix_fadvise or whatever, an explicit close(2) might help.
> The fact that there are open fds to deleted files is interesting... I
> wonder if people have reported weird disk space usage in the past
> (since such deleted files would not show up with 'du -sh' but eat
> space on the device until closed).
> My general understanding is that Cassandra does specifically rely on
> the GC to know when unused sstables can be removed. However the fact
> that the files are deleted I think means that this is not the problem,
> and the question is rather why open file descriptors/streams are
> leaking to these deleted sstables. But I'm speaking now without
> knowing when/where streams are closed.
> Are the deleted files indeed sstable, or was that a bad assumption on my part?
As a Cassandra newbie, I'm not sure how to tell, but they are all
to *.Data.db files, and all under the DataFileDirectory (as spec'ed
in storage-conf.xml), which is a separate directory than the
CommitLogDirectory.  I did not see any *Index.db or *Filter.db
files, but I may have missed them.

