Odd that this discussion happens now as I'm also getting this error.  I get a burst of error messages and then the system continues...with no apparent ill effect.
I can't tell what the system was doing at the time....here is the stack.  BTW Opscenter says I only have 4 or 5 SSTables in each of my 6 CFs.

ERROR [ReadStage:62384] 2013-07-14 18:04:26,062 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[ReadStage:62384,5,main]
java.io.IOError: java.io.FileNotFoundException: /tmp_vol/cassandra/data/dev_a/portfoliodao/dev_a-portfoliodao-hf-166-Data.db (Too many open files)
        at org.apache.cassandra.io.util.CompressedSegmentedFile.getSegment(CompressedSegmentedFile.java:69)
        at org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:898)
        at org.apache.cassandra.db.columniterator.SSTableNamesIterator.<init>(SSTableNamesIterator.java:63)
        at org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:61)
        at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:79)
        at org.apache.cassandra.db.CollationController.collectTimeOrderedData(CollationController.java:124)
        at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:64)
        at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1345)
        at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1207)
        at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1142)
        at org.apache.cassandra.db.Table.getRow(Table.java:378)
        at org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:58)
        at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:51)
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.FileNotFoundException: /tmp_vol/cassandra/data/dev_a/portfoliodao/dev_a-portfoliodao-hf-166-Data.db (Too many open files)
        at java.io.RandomAccessFile.open(Native Method)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:216)
        at org.apache.cassandra.io.util.RandomAccessReader.<init>(RandomAccessReader.java:67)
        at org.apache.cassandra.io.compress.CompressedRandomAccessReader.<init>(CompressedRandomAccessReader.java:64)
        at org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:46)
        at org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:41)
        at org.apache.cassandra.io.util.CompressedSegmentedFile.getSegment(CompressedSegmentedFile.java:63)
        ... 16 more

It doesn't tell you anything if file ends it with "ic-###", except pointing out the SSTable version it uses ("ic" in this case).

Files related to secondary index contain something like this in the filename: <KS>-<CF>.<IDX-NAME>, while in "regular" CFs do not contain any dots except the one just before file extension.


Also, looking through the log, it appears a lot of the files end with ic-#### which I assume is associated with a secondary index I have on the table.  Are secondary indexes really expensive from a file descriptor standpoint?  That particular table uses the default compaction scheme...

I have one table that is using leveled.  It was set to 10MB, I will try changing it to 256MB.  Is there a good way to merge the existing sstables?

Are you using leveled compaction?  If so, what do you have the file size set at?  If you're using the defaults, you'll have a ton of really small files.  I believe Albert Tobey recommended using 256MB for the table sstable_size_in_mb to avoid this problem.

I'm running into a problem where instances of my cluster are hitting over 450K open files.  Is this normal for a 4 node 1.2.6 cluster with replication factor of 3 and about 50GB of data on each node?  I can push the file descriptor limit up, but I plan on having a much larger load so I'm wondering if I should be looking at something else….

Let me know if you need more info…


