incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roland Gude <roland.g...@yoochoose.com>
Subject AW: problems while TimeUUIDType-index-querying with two expressions
Date Tue, 15 Mar 2011 09:39:21 GMT
Actually its not the column values that should be UUIDs in our case, but the column keys. The
CF uses TimeUUID ordering and the values are just some ByteArrays. Even with changing the
code to use UUIDSerializer instead of serializing the UUIDs manually the issue still exists.

As far as I can see, there is nothing wrong with the IndexExpression.
using two Index expressions with key=TimedUUID and Value=anything does not work
using one index expression (any one of the other two) alone does work fine.

I refactored Johannes code into a junit testcase. It  needs the cluster configured as described
in Johannes mail.
There are three cases. Two with one of the indexExpressions and one with both index expression.
The one with Both IndexExpression will never finish and youz will see the exception in the
Cassandra logs.

Bye,
roland

Von: aaron morton [mailto:aaron@thelastpickle.com]
Gesendet: Dienstag, 15. März 2011 07:54
An: user@cassandra.apache.org
Cc: Juergen Link; Roland Gude; hermes@datastax.com
Betreff: Re: problems while TimeUUIDType-index-querying with two expressions

Perfectly reasonable, created https://issues.apache.org/jira/browse/CASSANDRA-2328

Aaron
On 15 Mar 2011, at 16:52, Jonathan Ellis wrote:


Sounds like we should send an InvalidRequestException then.

On Mon, Mar 14, 2011 at 8:06 PM, aaron morton <aaron@thelastpickle.com<mailto:aaron@thelastpickle.com>>
wrote:

It's failing to when comparing two TimeUUID values because on of them is not
properly formatted. In this case it's comparing a stored value with the
value passed in the get_indexed_slice() query expression.
I'm going to assume it's the value passed for the expression.
When you create the IndexedSlicesQuery this is incorrect
IndexedSlicesQuery<String, byte[], byte[]> indexQuery = HFactory
.createIndexedSlicesQuery(keyspace,
stringSerializer, bytesSerializer, bytesSerializer);
Use a UUIDSerializer for the last param and then pass the UUID you want to
build the expressing. Rather than the string/byte thing you are passing
Hope that helps.
Aaron
On 15 Mar 2011, at 04:17, Johannes Hoerle wrote:

Hi all,

in order to improve our queries, we started to use IndexedSliceQueries from
the hector project (https://github.com/zznate/hector-examples). I followed
the instructions for creating IndexedSlicesQuery with
GetIndexedSlices.java.
I created the corresponding CF with in a keyspace called "Keyspace1" (
"create keyspace  Keyspace1;") with:
"create column family Indexed1 with column_type='Standard' and
comparator='UTF8Type' and keys_cached=200000 and read_repair_chance=1.0 and
rows_cached=20000 and column_metadata=[{column_name: birthdate,
validation_class: LongType, index_name: dateIndex, index_type:
KEYS},{column_name: birthmonth, validation_class: LongType, index_name:
monthIndex, index_type: KEYS}];"
and the example GetIndexedSlices.java worked fine.

Output of CF Indexed1:
---------------------------------------
[default@Keyspace1] list Indexed1;
Using default limit of 100
-------------------
RowKey: fake_key_12
=> (column=birthdate, value=1974, timestamp=1300110485826059)
=> (column=birthmonth, value=0, timestamp=1300110485826060)
=> (column=fake_column_0, value=66616b655f76616c75655f305f3132,
timestamp=1300110485826056)
=> (column=fake_column_1, value=66616b655f76616c75655f315f3132,
timestamp=1300110485826057)
=> (column=fake_column_2, value=66616b655f76616c75655f325f3132,
timestamp=1300110485826058)
-------------------
RowKey: fake_key_8
=> (column=birthdate, value=1974, timestamp=1300110485826039)
=> (column=birthmonth, value=8, timestamp=1300110485826040)
=> (column=fake_column_0, value=66616b655f76616c75655f305f38,
timestamp=1300110485826036)
=> (column=fake_column_1, value=66616b655f76616c75655f315f38,
timestamp=1300110485826037)
=> (column=fake_column_2, value=66616b655f76616c75655f325f38,
timestamp=1300110485826038)
-------------------
....


Now to the problem:
As we have another column format in our cluster (using TimeUUIDType as
comparator in CF definition) I adapted the application to our schema on a
cassandra-0.7.3 cluster.
We use a manually defined UUID for a mandator id index
(00000000-0000-1000-0000-000000000000) and another one for a userid index
(00000001-0000-1000-0000-000000000000). It can be created with:
"create column family ByUser with column_type='Standard' and
comparator='TimeUUIDType' and keys_cached=200000 and read_repair_chance=1.0
and rows_cached=20000 and column_metadata=[{column_name:
00000000-0000-1000-0000-000000000000, validation_class: BytesType,
index_name: mandatorIndex, index_type: KEYS}, {column_name:
00000001-0000-1000-0000-000000000000, validation_class: BytesType,
index_name: useridIndex, index_type: KEYS}];"


which looks in the cluster using cassandra-cli like this:

[default@Keyspace1] describe keyspace;
Keyspace: Keyspace1:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
    Replication Factor: 1
  Column Families:
    ColumnFamily: ByUser
      Columns sorted by: org.apache.cassandra.db.marshal.TimeUUIDType
      Row cache size / save period: 20000.0/0
      Key cache size / save period: 200000.0/14400
      Memtable thresholds: 0.2953125/63/1440
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 0.01
      Built indexes: [ByUser.mandatorIndex, ByUser.useridIndex]
      Column Metadata:
        Column Name: 00000001-0000-1000-0000-000000000000
          Validation Class: org.apache.cassandra.db.marshal.BytesType
          Index Name: useridIndex
          Index Type: KEYS
        Column Name: 00000000-0000-1000-0000-000000000000
          Validation Class: org.apache.cassandra.db.marshal.BytesType
          Index Name: mandatorIndex
          Index Type: KEYS
    ColumnFamily: Indexed1
      Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
      Row cache size / save period: 20000.0/0
      Key cache size / save period: 200000.0/14400
      Memtable thresholds: 0.2953125/63/1440
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 0.01
      Built indexes: [Indexed1.dateIndex, Indexed1.monthIndex]
      Column Metadata:
        Column Name: birthmonth (birthmonth)
          Validation Class: org.apache.cassandra.db.marshal.LongType
          Index Name: monthIndex
          Index Type: KEYS
        Column Name: birthdate (birthdate)
          Validation Class: org.apache.cassandra.db.marshal.LongType
          Index Name: dateIndex
          Index Type: KEYS
[default@Keyspace1] list ByUser;
Using default limit of 100
-------------------
RowKey: testMandator!!user01
=> (column=00000000-0000-1000-0000-000000000000,
value=746573744d616e6461746f72, timestamp=1300111213321000)
=> (column=00000001-0000-1000-0000-000000000000, value=757365723031,
timestamp=1300111213322000)
=> (column=f064b480-495e-11e0-abc4-0024e89fa587, value=3135,
timestamp=1300111213561000)

1 Row Returned.

the values of the index colums 00000000-0000-1000-0000-000000000000 and
00000001-0000-1000-0000-000000000000 represent "testMandator" and and
"user01" as bytes
the third column is a randomly generated one with value "15" that are
inserted in GetTimeUUIDIndexedSlices app.
I attached both source codes, GetIndexedSlices and GetTimeUUIDIndexedSlices.
Currently the second index expression for the userid index in
GetTimeUUIDIndexedSlices.queryCf(...) method

            indexQuery.addEqualsExpression(asByteArray(MANDATOR_UUID), new
StringSerializer().toBytes(mandator));
        //indexQuery.addEqualsExpression(asByteArray(USERID_INDEX_UUID), new
StringSerializer().toBytes(dummyUserId));

is commented out, so the GetTimeUUIDIndexedSlices will run. Using one
IndexQuery works perfectly fine but as soon as I add a second eq, gt, gte or
lt expression I get an IndexOutOfBoundsException (see below).

This issue can be easily reproduced by
- downloading the zznate example
(https://github.com/zznate/hector-examples),
- mavenizing it to an eclipse project with "mvn clean eclipse:eclipse",
- importing it in eclipse and
- letting it run against a locally running cassandra instance (v0.7.3) which
has the default settings (no changes in the .yaml)

I hope that someone can help me with this issue ... after a couple of days
it's driving me bonkers.

Thx in advance,
Johannes


Exception:
ERROR 14:47:56,842 Error in ThreadPoolExecutor
java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
        at
org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
bHandler.java:51)
        at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.
java:72)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.IndexOutOfBoundsException: 6
        at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:121)
        at
org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(Ti
meUUIDType.java:56)
        at
org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
a:45)
        at
org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
a:29)
        at
org.apache.cassandra.db.ColumnFamilyStore.satisfies(ColumnFamilyStore
.java:1608)
        at
org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java
:1552)
        at
org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
bHandler.java:42)
        ... 4 more
ERROR 14:47:56,852 Fatal exception in thread Thread[ReadStage:14,5,main]
java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
        at
org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
bHandler.java:51)
        at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.
java:72)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:908)
<GetIndexedSlices.java><GetTimeUUIDIndexedSlices.java>




--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Mime
View raw message