incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Speirs <bill.spe...@gmail.com>
Subject Re: Super Slow Multi-gets
Date Thu, 10 Feb 2011 16:55:55 GMT
We attempted a compaction to see if that would improve read
performance (BTW: write performance is as expected, fast!). Here is
the result, an ArrayOutOfBounds exception:

INFO 11:48:41,070 Compacting
[org.apache.cassandra.io.sstable.SSTableReader(path='/test/cassandra/data/Logging/DateIndex-e-7-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/test/cassandra/data/Logging/FieldIndex-e-9-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/test/cassandra/data/Logging/FieldIndex-e-10-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/test/cassandra/data/Logging/Messages-e-13-Data.db')]

ERROR 11:48:41,080 Fatal exception in thread
Thread[CompactionExecutor:1,1,main]
java.lang.ArrayIndexOutOfBoundsException: 7
        at org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(TimeUUIDType.java:58)
        at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:45)
        at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:29)
        at java.util.concurrent.ConcurrentSkipListMap$ComparableUsingComparator.compareTo(ConcurrentSkipListMap.java:606)
        at java.util.concurrent.ConcurrentSkipListMap.doPut(ConcurrentSkipListMap.java:878)
        at java.util.concurrent.ConcurrentSkipListMap.putIfAbsent(ConcurrentSkipListMap.java:1893)
        at org.apache.cassandra.db.ColumnFamily.addColumn(ColumnFamily.java:218)
        at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:130)
        at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:137)
        at org.apache.cassandra.io.PrecompactedRow.<init>(PrecompactedRow.java:78)
        at org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:138)
        at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:107)
        at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:42)
        at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
        at org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183)
        at org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94)
        at org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:312)
        at org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:122)
        at org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:92)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

Does any of that mean anything to anyone?

Thanks...

Bill-

On Thu, Feb 10, 2011 at 11:00 AM, Bill Speirs <bill.speirs@gmail.com> wrote:
> I have a 7 node setup with a replication factor of 1 and a read
> consistency of 1. I have two column families: Messages which stores
> millions of rows with a UUID for the row key, DateIndex which stores
> thousands of rows with a String as the row key. I perform 2 look-ups
> for my queries:
>
> 1) Fetch the row from DateIndex that includes the date I'm looking
> for. This returns 1,000 columns where the column names are the UUID of
> the messages
> 2) Do a multi-get (Hector client) using those 1,000 row keys I got
> from the first query.
>
> Query 1 is taking ~300ms to fetch 1,000 columns from a single row...
> respectable. However, query 2 is taking over 50s to perform 1,000 row
> look-ups! Also, when I scale down to 100 row look-ups for query 2, the
> time scales in a similar fashion, down to 5s.
>
> Am I doing something wrong here? It seems like taking 5s to look-up
> 100 rows in a distributed hash table is way too slow.
>
> Thoughts?
>
> Bill-
>

Mime
View raw message