incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dragos cernahoschi <dragos.cernahos...@gmail.com>
Subject Re: CASSANDRA-1472 (bitmap indexes)
Date Tue, 09 Nov 2010 11:33:30 GMT
There are about 500 SSTables (12GB of data including index data,
statistics...) The source data file had about 3GB/26 million rows.

I only test with EQ expressions for now.

Increasing the file limit resolved the problem, but now I'm getting
TimedOutException(s) from thrift when "querying" even with slice size of 1.
Is my machine too small (core 2 duo 2.93 2GB RAM Ubuntu 10.04) for such a
test?

I really have some interesting sets of data to test indexes with and I want
to make a comparison between ordinary indexes and bitmap indexes.

Thank you,
Dragos

On Mon, Nov 8, 2010 at 6:42 PM, Stu Hood <stu.hood@rackspace.com> wrote:

> Dragos,
>
> How many SSTables did you have on disk, and were any of your index
> expressions GT(E)/LT(E)?
>
> I expect that you are bumping into a limitation of the current
> implementation: it opens up to 128 file-handles per SSTable in the worst
> case for a GT/LT query (one per index bucket).
>
> A future version might remove that requirement, but for now, you should
> probably bump the file handle limit on your machine to at least 2^16.
>
> Thanks,
> Stu
>
>
> -----Original Message-----
> From: "dragos cernahoschi" <dragos.cernahoschi@gmail.com>
> Sent: Monday, November 8, 2010 10:05am
> To: dev@cassandra.apache.org
> Subject: CASSANDRA-1472 (bitmap indexes)
>
> Hi,
>
> I've got an exception during the following test:
>
> test machine: core 2 duo 2.93 2GB RAM Ubuntu 10.04
>
> test scenario:
> - 1 column family
> - about 15 columns
> - 7 indexed columns (bitmap)
> - 26 million rows (insert operation went fine)
> - thrift "query" on 3 of the indexed columns with get_indexed_slices
> (count:
> 100)
> - got the following exception:
>
> 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal exception in
> thread Thread[ReadStage:3,5,main]
> java.io.IOError: java.io.FileNotFoundException:
> /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too many open
> files)
>    at
>
> org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:78)
>    at
>
> org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.openBin(BitmapIndexReader.java:226)
>    at
>
> org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.iterator(BitmapIndexReader.java:214)
>    at
> org.apache.cassandra.io.sstable.SSTableReader.scan(SSTableReader.java:523)
>    at
>
> org.apache.cassandra.db.secindex.KeysBitmapIndex.iterator(KeysBitmapIndex.java:103)
>    at
> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1371)
>    at
>
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41)
>    at
>
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51)
>    at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>    at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>    at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.FileNotFoundException:
> /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too many open
> files)
>    at java.io.FileInputStream.open(Native Method)
>    at java.io.FileInputStream.<init>(FileInputStream.java:106)
>    at
> org.apache.avro.file.SeekableFileInput.<init>(SeekableFileInput.java:29)
>    at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:38)
>    at
>
> org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:72)
>    ... 10 more
> 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal exception in
> thread Thread[ReadStage:2,5,main]
> java.io.IOError: java.io.FileNotFoundException:
> /home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too many open
> files)
>    at
>
> org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:68)
>    at
>
> org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:129)
>    at
>
> org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:1)
>    at
>
> org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:455)
>    at
>
> org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:572)
>    at
>
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:49)
>    at
>
> org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:72)
>    at
>
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:84)
>    at
>
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1190)
>    at
>
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1082)
>    at
>
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1052)
>    at
> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1378)
>    at
>
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41)
>    at
>
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51)
>    at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>    at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>    at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.FileNotFoundException:
> /home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too many open
> files)
>    at java.io.RandomAccessFile.open(Native Method)
>    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
>    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:98)
>    at
>
> org.apache.cassandra.io.util.BufferedRandomAccessFile.<init>(BufferedRandomAccessFile.java:142)
>    at
>
> org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:62)
>    ... 16 more
>
> The same test worked fine with 1 million rows.
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message