incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dragos cernahoschi <dragos.cernahos...@gmail.com>
Subject Re: CASSANDRA-1472 (bitmap indexes)
Date Wed, 10 Nov 2010 10:31:24 GMT
Welcome.

It seems exactly that: when running one of the queries that generates a
timed out exception, cassandra enters some kind of infinite loop.

Trace:

DEBUG 12:15:22,307 scan
DEBUG 12:15:22,348 restricted ranges for query
[78703492656118554854272571946195123045,0] are
[[78703492656118554854272571946195123045,0]]
DEBUG 12:15:22,348 scan ranges are
[78703492656118554854272571946195123045,0]
DEBUG 12:15:22,380 reading
org.apache.cassandra.db.IndexScanCommand@1544e44from 110@localhost
/127.0.0.1
DEBUG 12:15:22,402 For operator EQ on Lynx 2.7 in rows (1481600,3203072):
bins (12,12) of #<BitmapIndexReader [null] on
/home/dragos/cassandra/data/keyspace/visit-e-1097-1-Bitidx.db>
DEBUG 12:15:22,422 For operator EQ on Lynx 2.7 in rows (1852032,4003840):
bins (12,12) of #<BitmapIndexReader [null] on
/home/dragos/cassandra/data/keyspace/visit-e-1103-1-Bitidx.db>
DEBUG 12:15:22,423 For operator EQ on Lynx 2.7 in rows (718336,1551616):
bins (12,12) of #<BitmapIndexReader [null] on
/home/dragos/cassandra/data/keyspace/visit-e-1112-1-Bitidx.db>
DEBUG 12:15:22,423 For operator EQ on Lynx 2.7 in rows (1482112,3203072):
bins (12,12) of #<BitmapIndexReader [null] on
/home/dragos/cassandra/data/keyspace/visit-e-1108-1-Bitidx.db>
DEBUG 12:15:22,424 For operator EQ on Lynx 2.7 in rows (370432,800768): bins
(12,12) of #<BitmapIndexReader [null] on
/home/dragos/cassandra/data/keyspace/visit-e-1109-1-Bitidx.db>
DEBUG 12:15:22,424 For operator EQ on Lynx 2.7 in rows (5755392,12436992):
bins (12,12) of #<BitmapIndexReader [null] on
/home/dragos/cassandra/data/keyspace/visit-e-1095-1-Bitidx.db>
DEBUG 12:15:22,425 For operator EQ on Lynx 2.7 in rows (369664,800768): bins
(12,12) of #<BitmapIndexReader [null] on
/home/dragos/cassandra/data/keyspace/visit-e-1110-1-Bitidx.db>
DEBUG 12:15:22,515 collecting 0 of 2147483647: 62726f77736572:false:8@0
DEBUG 12:15:22,515 collecting 1 of 2147483647:
636f6e6e656374696f6e:false:3@0
DEBUG 12:15:22,515 collecting 2 of 2147483647: 636f756e747279:false:7@0
DEBUG 12:15:22,516 collecting 3 of 2147483647: 646f6d61696e:false:15@0
DEBUG 12:15:22,518 collecting 4 of 2147483647: 6475726174696f6e:false:3@0
DEBUG 12:15:22,521 collecting 5 of 2147483647: 6c696e65:false:4@0
DEBUG 12:15:22,521 collecting 6 of 2147483647: 6f73:false:12@0
DEBUG 12:15:22,521 collecting 7 of 2147483647: 7069:false:3@0
DEBUG 12:15:22,521 collecting 8 of 2147483647: 74696d657374616d70:false:10@0
DEBUG 12:15:22,522 collecting 9 of 2147483647: 75736572:false:15@0
DEBUG 12:15:22,522 collecting 10 of 2147483647: 7a6970:false:5@0
DEBUG 12:15:22,523 collecting 0 of 2147483647: 62726f77736572:false:8@0
DEBUG 12:15:22,524 collecting 1 of 2147483647:
636f6e6e656374696f6e:false:3@0
DEBUG 12:15:22,524 collecting 2 of 2147483647: 636f756e747279:false:7@0
DEBUG 12:15:22,524 collecting 3 of 2147483647: 646f6d61696e:false:15@0
DEBUG 12:15:22,524 collecting 4 of 2147483647: 6475726174696f6e:false:3@0
DEBUG 12:15:22,525 collecting 5 of 2147483647: 6c696e65:false:4@0
DEBUG 12:15:22,525 collecting 6 of 2147483647: 6f73:false:19@0
DEBUG 12:15:22,525 collecting 7 of 2147483647: 7069:false:3@0

...

goes forever.

I'll try the KEYS indexes on the same scenario and let you know.

Dragos

On Tue, Nov 9, 2010 at 8:05 PM, Stu Hood <stu.hood@rackspace.com> wrote:

> Interesting, thanks for the info.
>
> Perhaps the limitation is that index queries involving multiple clauses are
> currently implemented using brute-force filtering rather than an index join?
> The bitmap indexes have native support for this type of join, but it's not
> being used yet.
>
> To confirm: have you tried the same scenario with KEYS indexes? They use
> the same codepath for multiple index expressions, and should experience the
> same timeouts. Also, can you rerun the KEYS_BITMAP test with DEBUG logging
> enabled, to ensure that we aren't going into some kind of infinite loop?
>
> Thanks for the help,
> Stu
>
> -----Original Message-----
> From: "dragos cernahoschi" <dragos.cernahoschi@gmail.com>
> Sent: Tuesday, November 9, 2010 11:50am
> To: dev@cassandra.apache.org
> Subject: Re: CASSANDRA-1472 (bitmap indexes)
>
> I'm running the query on three columns with cardinalities: 22, 17 and 10.
> Interesting, if combining columns with cardinalities:
>
> 22 + 17 => no exception
> 22 + 10 => no exception
> 10 + 17 => timed out exception
> 22 + 17 + 10 => timed out exception
>
>
> On Tue, Nov 9, 2010 at 6:29 PM, Stu Hood <stu.hood@rackspace.com> wrote:
>
> > Can you tell me a little bit about your key distribution? How many unique
> > values are indexed (the cardinality)?
> >
> > Until the OrBiC projection I mention on 1472 is implemented, the bitmap
> > secondary indexes will perform terribly for high cardinality datasets.
> >
> > Thanks!
> >
> >
> > -----Original Message-----
> > From: "dragos cernahoschi" <dragos.cernahoschi@gmail.com>
> > Sent: Tuesday, November 9, 2010 10:14am
> > To: dev@cassandra.apache.org
> > Subject: Re: CASSANDRA-1472 (bitmap indexes)
> >
> > Meantime the number of SSTable(s) reduced to just 7. Initially the
> > compaction thread suffered the same problem of "too many open files" and
> > couldn't do any compaction.
> >
> > But I'm still not able to run my tests: TimedOutException :(
> >
> > On Tue, Nov 9, 2010 at 5:51 PM, Stu Hood <stu.hood@rackspace.com> wrote:
> >
> > > Hmm, 500 sstables is definitely a degenerate case: did you disable
> > > compaction? By default, Cassandra strives to keep the sstable count
> below
> > > ~32, since accesses to separate sstables require seeks.
> > >
> > > In this case, the query will seek 500 times to check the secondary
> index
> > > for each sstable: if it finds matches it will need to seek to find them
> > in
> > > the primary index, and seek again for the data file.
> > >
> > > -----Original Message-----
> > > From: "dragos cernahoschi" <dragos.cernahoschi@gmail.com>
> > > Sent: Tuesday, November 9, 2010 5:33am
> > > To: dev@cassandra.apache.org
> > > Subject: Re: CASSANDRA-1472 (bitmap indexes)
> > >
> > > There are about 500 SSTables (12GB of data including index data,
> > > statistics...) The source data file had about 3GB/26 million rows.
> > >
> > > I only test with EQ expressions for now.
> > >
> > > Increasing the file limit resolved the problem, but now I'm getting
> > > TimedOutException(s) from thrift when "querying" even with slice size
> of
> > 1.
> > > Is my machine too small (core 2 duo 2.93 2GB RAM Ubuntu 10.04) for such
> a
> > > test?
> > >
> > > I really have some interesting sets of data to test indexes with and I
> > want
> > > to make a comparison between ordinary indexes and bitmap indexes.
> > >
> > > Thank you,
> > > Dragos
> > >
> > > On Mon, Nov 8, 2010 at 6:42 PM, Stu Hood <stu.hood@rackspace.com>
> wrote:
> > >
> > > > Dragos,
> > > >
> > > > How many SSTables did you have on disk, and were any of your index
> > > > expressions GT(E)/LT(E)?
> > > >
> > > > I expect that you are bumping into a limitation of the current
> > > > implementation: it opens up to 128 file-handles per SSTable in the
> > worst
> > > > case for a GT/LT query (one per index bucket).
> > > >
> > > > A future version might remove that requirement, but for now, you
> should
> > > > probably bump the file handle limit on your machine to at least 2^16.
> > > >
> > > > Thanks,
> > > > Stu
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: "dragos cernahoschi" <dragos.cernahoschi@gmail.com>
> > > > Sent: Monday, November 8, 2010 10:05am
> > > > To: dev@cassandra.apache.org
> > > > Subject: CASSANDRA-1472 (bitmap indexes)
> > > >
> > > > Hi,
> > > >
> > > > I've got an exception during the following test:
> > > >
> > > > test machine: core 2 duo 2.93 2GB RAM Ubuntu 10.04
> > > >
> > > > test scenario:
> > > > - 1 column family
> > > > - about 15 columns
> > > > - 7 indexed columns (bitmap)
> > > > - 26 million rows (insert operation went fine)
> > > > - thrift "query" on 3 of the indexed columns with get_indexed_slices
> > > > (count:
> > > > 100)
> > > > - got the following exception:
> > > >
> > > > 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal
> > exception
> > > in
> > > > thread Thread[ReadStage:3,5,main]
> > > > java.io.IOError: java.io.FileNotFoundException:
> > > > /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too
> many
> > > open
> > > > files)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:78)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.openBin(BitmapIndexReader.java:226)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.iterator(BitmapIndexReader.java:214)
> > > >    at
> > > >
> > >
> >
> org.apache.cassandra.io.sstable.SSTableReader.scan(SSTableReader.java:523)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.db.secindex.KeysBitmapIndex.iterator(KeysBitmapIndex.java:103)
> > > >    at
> > > >
> > >
> >
> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1371)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51)
> > > >    at
> > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > > >    at
> > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > > >    at java.lang.Thread.run(Thread.java:662)
> > > > Caused by: java.io.FileNotFoundException:
> > > > /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too
> many
> > > open
> > > > files)
> > > >    at java.io.FileInputStream.open(Native Method)
> > > >    at java.io.FileInputStream.<init>(FileInputStream.java:106)
> > > >    at
> > > >
> > org.apache.avro.file.SeekableFileInput.<init>(SeekableFileInput.java:29)
> > > >    at
> > org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:38)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:72)
> > > >    ... 10 more
> > > > 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal
> > exception
> > > in
> > > > thread Thread[ReadStage:2,5,main]
> > > > java.io.IOError: java.io.FileNotFoundException:
> > > > /home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too many
> > open
> > > > files)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:68)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:129)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:1)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:455)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:572)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:49)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:72)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:84)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1190)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1082)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1052)
> > > >    at
> > > >
> > >
> >
> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1378)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51)
> > > >    at
> > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > > >    at
> > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > > >    at java.lang.Thread.run(Thread.java:662)
> > > > Caused by: java.io.FileNotFoundException:
> > > > /home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too many
> > open
> > > > files)
> > > >    at java.io.RandomAccessFile.open(Native Method)
> > > >    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
> > > >    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:98)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.io.util.BufferedRandomAccessFile.<init>(BufferedRandomAccessFile.java:142)
> > > >    at
> > > >
> > > >
> > >
> >
> org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:62)
> > > >    ... 16 more
> > > >
> > > > The same test worked fine with 1 million rows.
> > > >
> > > >
> > > >
> > >
> > >
> > >
> >
> >
> >
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message