cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Morton <aa...@thelastpickle.com>
Subject Re: AW: problems while TimeUUIDType-index-querying with two expressions
Date Thu, 17 Mar 2011 21:35:25 GMT
Good work.

Aaron

On 17/03/2011, at 4:37 PM, Jonathan Ellis <jbellis@gmail.com> wrote:

> Thanks for tracking that down, Roland.  I've created
> https://issues.apache.org/jira/browse/CASSANDRA-2347 to fix this.
> 
> On Wed, Mar 16, 2011 at 10:37 AM, Roland Gude <roland.gude@yoochoose.com> wrote:
>> I have applied the suggested changes in my local source tree and did run all
>> my testcases (the supplied ones as well as those with real data).
>> 
>> They do work now.
>> 
>> 
>> 
>> Von: Roland Gude [mailto:roland.gude@yoochoose.com]
>> Gesendet: Mittwoch, 16. März 2011 16:29
>> 
>> An: user@cassandra.apache.org
>> Betreff: AW: AW: problems while TimeUUIDType-index-querying with two
>> expressions
>> 
>> 
>> 
>> With debugging into it i found something that might be the issue (please
>> correct me if I am wrong):
>> 
>> In ColumnFamilyStore.java lines 1597 to 1613 is the code that checks whether
>> some column satisfies an index expression.
>> 
>> In line 1608 it compares the value of the index expression with the value
>> given in the expression.
>> 
>> 
>> 
>> For this comparison it utilizes the comparator of the columnfamily while it
>> should use the comparator of the Column validation class.
>> 
>> 
>> 
>>     private static boolean satisfies(ColumnFamily data, IndexClause clause,
>> IndexExpression first)
>> 
>>     {
>> 
>>         for (IndexExpression expression : clause.expressions)
>> 
>>         {
>> 
>>             // (we can skip "first" since we already know it's satisfied)
>> 
>>             if (expression == first)
>> 
>>                 continue;
>> 
>>             // check column data vs expression
>> 
>>             IColumn column = data.getColumn(expression.column_name);
>> 
>>             if (column == null)
>> 
>>                 return false;
>> 
>>             int v = data.getComparator().compare(column.value(),
>> expression.value);
>> 
>>             if (!satisfies(v, expression.op))
>> 
>>                 return false;
>> 
>>         }
>> 
>>         return true;
>> 
>>     }
>> 
>> 
>> 
>> 
>> 
>> The line 1608 should be changed from:
>> 
>>             int v = data.getComparator().compare(column.value(),
>> expression.value);
>> 
>> 
>> 
>> to
>> 
>>             int v = data.metadata().getValueValidator
>> (expression.column_name).compare(column.value(), expression.value);
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> greetings roland
>> 
>> 
>> 
>> 
>> 
>> Von: Roland Gude [mailto:roland.gude@yoochoose.com]
>> Gesendet: Mittwoch, 16. März 2011 14:50
>> An: user@cassandra.apache.org
>> Betreff: AW: AW: problems while TimeUUIDType-index-querying with two
>> expressions
>> 
>> 
>> 
>> Hi Aaron,
>> 
>> 
>> 
>> now I am completely confused.
>> 
>> The code that did not work for days now – like a miracle – works even
>> against the unpatched Cassandra 0.7.3 but the testcase still does not…
>> 
>> There seems to be some randomness in whether it works or not (which is a bad
>> sign I think)… I will debug a little deeper into this and report anything I
>> find.
>> 
>> 
>> 
>> Greetings,
>> 
>> roland
>> 
>> 
>> 
>> Von: aaron morton [mailto:aaron@thelastpickle.com]
>> Gesendet: Mittwoch, 16. März 2011 01:15
>> An: user@cassandra.apache.org
>> Betreff: Re: AW: problems while TimeUUIDType-index-querying with two
>> expressions
>> 
>> 
>> 
>> Have attached a patch
>> to https://issues.apache.org/jira/browse/CASSANDRA-2328
>> 
>> 
>> 
>> Can you give it a try ? You should not get a InvalidRequestException when
>> you send an invalid name or value in the query expression.
>> 
>> 
>> 
>> Aaron
>> 
>> 
>> 
>> On 16 Mar 2011, at 10:30, aaron morton wrote:
>> 
>> 
>> 
>> Will have the Jira I created finished soon, it's a legitimate issue we
>> should be validating the column names and values when a ger_indexed_slice()
>> request is sent. The error in your original email shows that.
>> 
>> 
>> 
>> WRT your code example. You are using the TimeUUID Validator for the column
>> name when creating the index expression, but are using a string serialiser
>> for the value...
>> 
>> IndexedSlicesQuery<String, UUID, String> indexQuery = HFactory
>>                         .createIndexedSlicesQuery(keyspace,
>>                                                stringSerializer,
>> UUID_SERIALIZER, stringSerializer);
>>         indexQuery.addEqualsExpression(MANDATOR_UUID, mandator);
>> 
>> But your schema is saying it is a bytes type...
>> 
>> 
>> 
>> column_metadata=[{column_name: 00000000-0000-1000-0000-000000000000,
>> validation_class: BytesType, index_name: mandatorIndex, index_type: KEYS},
>> {column_name: 00000001-0000-1000-0000-000000000000, validation_class:
>> BytesType, index_name: useridIndex, index_type: KEYS}];"On 15 Mar 2011, at
>> 22:41,
>> 
>> 
>> 
>> Once I have the patch can you apply it and run your test again ?
>> 
>> 
>> 
>> You may also want to ask on the Hector list if it automagically check you
>> are using the correct types when creating an IndexedSlicesQuery.
>> 
>> 
>> 
>> Aaron
>> 
>> 
>> 
>> Roland Gude wrote:
>> 
>> 
>> 
>> Forgot to attach the source code… here it comes
>> 
>> 
>> 
>> Von: Roland Gude [mailto:roland.gude@yoochoose.com]
>> Gesendet: Dienstag, 15. März 2011 10:39
>> An: user@cassandra.apache.org
>> Betreff: AW: problems while TimeUUIDType-index-querying with two expressions
>> 
>> 
>> 
>> Actually its not the column values that should be UUIDs in our case, but the
>> column keys. The CF uses TimeUUID ordering and the values are just some
>> ByteArrays. Even with changing the code to use UUIDSerializer instead of
>> serializing the UUIDs manually the issue still exists.
>> 
>> 
>> 
>> As far as I can see, there is nothing wrong with the IndexExpression.
>> 
>> using two Index expressions with key=TimedUUID and Value=anything does not
>> work
>> 
>> using one index expression (any one of the other two) alone does work fine.
>> 
>> 
>> 
>> I refactored Johannes code into a junit testcase. It  needs the cluster
>> configured as described in Johannes mail.
>> 
>> There are three cases. Two with one of the indexExpressions and one with
>> both index expression. The one with Both IndexExpression will never finish
>> and youz will see the exception in the Cassandra logs.
>> 
>> 
>> 
>> Bye,
>> 
>> roland
>> 
>> 
>> 
>> Von: aaron morton [mailto:aaron@thelastpickle.com]
>> Gesendet: Dienstag, 15. März 2011 07:54
>> An: user@cassandra.apache.org
>> Cc: Juergen Link; Roland Gude; hermes@datastax.com
>> Betreff: Re: problems while TimeUUIDType-index-querying with two expressions
>> 
>> 
>> 
>> Perfectly reasonable,
>> created https://issues.apache.org/jira/browse/CASSANDRA-2328
>> 
>> 
>> 
>> Aaron
>> 
>> On 15 Mar 2011, at 16:52, Jonathan Ellis wrote:
>> 
>> 
>> 
>> Sounds like we should send an InvalidRequestException then.
>> 
>> On Mon, Mar 14, 2011 at 8:06 PM, aaron morton <aaron@thelastpickle.com>
>> wrote:
>> 
>> It's failing to when comparing two TimeUUID values because on of them is not
>> 
>> properly formatted. In this case it's comparing a stored value with the
>> 
>> value passed in the get_indexed_slice() query expression.
>> 
>> I'm going to assume it's the value passed for the expression.
>> 
>> When you create the IndexedSlicesQuery this is incorrect
>> 
>> IndexedSlicesQuery<String, byte[], byte[]> indexQuery = HFactory
>> 
>> .createIndexedSlicesQuery(keyspace,
>> 
>> stringSerializer, bytesSerializer, bytesSerializer);
>> 
>> Use a UUIDSerializer for the last param and then pass the UUID you want to
>> 
>> build the expressing. Rather than the string/byte thing you are passing
>> 
>> Hope that helps.
>> 
>> Aaron
>> 
>> On 15 Mar 2011, at 04:17, Johannes Hoerle wrote:
>> 
>> 
>> 
>> Hi all,
>> 
>> 
>> 
>> in order to improve our queries, we started to use IndexedSliceQueries from
>> 
>> the hector project (https://github.com/zznate/hector-examples). I followed
>> 
>> the instructions for creating IndexedSlicesQuery with
>> 
>> GetIndexedSlices.java.
>> 
>> I created the corresponding CF with in a keyspace called “Keyspace1” (
>> 
>> “create keyspace  Keyspace1;”) with:
>> 
>> "create column family Indexed1 with column_type='Standard' and
>> 
>> comparator='UTF8Type' and keys_cached=200000 and read_repair_chance=1.0 and
>> 
>> rows_cached=20000 and column_metadata=[{column_name: birthdate,
>> 
>> validation_class: LongType, index_name: dateIndex, index_type:
>> 
>> KEYS},{column_name: birthmonth, validation_class: LongType, index_name:
>> 
>> monthIndex, index_type: KEYS}];"
>> 
>> and the example GetIndexedSlices.java worked fine.
>> 
>> 
>> 
>> Output of CF Indexed1:
>> 
>> ---------------------------------------
>> 
>> [default@Keyspace1] list Indexed1;
>> 
>> Using default limit of 100
>> 
>> -------------------
>> 
>> RowKey: fake_key_12
>> 
>> => (column=birthdate, value=1974, timestamp=1300110485826059)
>> 
>> => (column=birthmonth, value=0, timestamp=1300110485826060)
>> 
>> => (column=fake_column_0, value=66616b655f76616c75655f305f3132,
>> 
>> timestamp=1300110485826056)
>> 
>> => (column=fake_column_1, value=66616b655f76616c75655f315f3132,
>> 
>> timestamp=1300110485826057)
>> 
>> => (column=fake_column_2, value=66616b655f76616c75655f325f3132,
>> 
>> timestamp=1300110485826058)
>> 
>> -------------------
>> 
>> RowKey: fake_key_8
>> 
>> => (column=birthdate, value=1974, timestamp=1300110485826039)
>> 
>> => (column=birthmonth, value=8, timestamp=1300110485826040)
>> 
>> => (column=fake_column_0, value=66616b655f76616c75655f305f38,
>> 
>> timestamp=1300110485826036)
>> 
>> => (column=fake_column_1, value=66616b655f76616c75655f315f38,
>> 
>> timestamp=1300110485826037)
>> 
>> => (column=fake_column_2, value=66616b655f76616c75655f325f38,
>> 
>> timestamp=1300110485826038)
>> 
>> -------------------
>> 
>> ....
>> 
>> 
>> 
>> 
>> 
>> Now to the problem:
>> 
>> As we have another column format in our cluster (using TimeUUIDType as
>> 
>> comparator in CF definition) I adapted the application to our schema on a
>> 
>> cassandra-0.7.3 cluster.
>> 
>> We use a manually defined UUID for a mandator id index
>> 
>> (00000000-0000-1000-0000-000000000000) and another one for a userid index
>> 
>> (00000001-0000-1000-0000-000000000000). It can be created with:
>> 
>> "create column family ByUser with column_type='Standard' and
>> 
>> comparator='TimeUUIDType' and keys_cached=200000 and read_repair_chance=1.0
>> 
>> and rows_cached=20000 and column_metadata=[{column_name:
>> 
>> 00000000-0000-1000-0000-000000000000, validation_class: BytesType,
>> 
>> index_name: mandatorIndex, index_type: KEYS}, {column_name:
>> 
>> 00000001-0000-1000-0000-000000000000, validation_class: BytesType,
>> 
>> index_name: useridIndex, index_type: KEYS}];"
>> 
>> 
>> 
>> 
>> 
>> which looks in the cluster using cassandra-cli like this:
>> 
>> 
>> 
>> [default@Keyspace1] describe keyspace;
>> 
>> Keyspace: Keyspace1:
>> 
>>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>> 
>>     Replication Factor: 1
>> 
>>   Column Families:
>> 
>>     ColumnFamily: ByUser
>> 
>>       Columns sorted by: org.apache.cassandra.db.marshal.TimeUUIDType
>> 
>>       Row cache size / save period: 20000.0/0
>> 
>>       Key cache size / save period: 200000.0/14400
>> 
>>       Memtable thresholds: 0.2953125/63/1440
>> 
>>       GC grace seconds: 864000
>> 
>>       Compaction min/max thresholds: 4/32
>> 
>>       Read repair chance: 0.01
>> 
>>       Built indexes: [ByUser.mandatorIndex, ByUser.useridIndex]
>> 
>>       Column Metadata:
>> 
>>         Column Name: 00000001-0000-1000-0000-000000000000
>> 
>>           Validation Class: org.apache.cassandra.db.marshal.BytesType
>> 
>>           Index Name: useridIndex
>> 
>>           Index Type: KEYS
>> 
>>         Column Name: 00000000-0000-1000-0000-000000000000
>> 
>>           Validation Class: org.apache.cassandra.db.marshal.BytesType
>> 
>>           Index Name: mandatorIndex
>> 
>>           Index Type: KEYS
>> 
>>     ColumnFamily: Indexed1
>> 
>>       Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>> 
>>       Row cache size / save period: 20000.0/0
>> 
>>       Key cache size / save period: 200000.0/14400
>> 
>>       Memtable thresholds: 0.2953125/63/1440
>> 
>>       GC grace seconds: 864000
>> 
>>       Compaction min/max thresholds: 4/32
>> 
>>       Read repair chance: 0.01
>> 
>>       Built indexes: [Indexed1.dateIndex, Indexed1.monthIndex]
>> 
>>       Column Metadata:
>> 
>>         Column Name: birthmonth (birthmonth)
>> 
>>           Validation Class: org.apache.cassandra.db.marshal.LongType
>> 
>>           Index Name: monthIndex
>> 
>>           Index Type: KEYS
>> 
>>         Column Name: birthdate (birthdate)
>> 
>>           Validation Class: org.apache.cassandra.db.marshal.LongType
>> 
>>           Index Name: dateIndex
>> 
>>           Index Type: KEYS
>> 
>> [default@Keyspace1] list ByUser;
>> 
>> Using default limit of 100
>> 
>> -------------------
>> 
>> RowKey: testMandator!!user01
>> 
>> => (column=00000000-0000-1000-0000-000000000000,
>> 
>> value=746573744d616e6461746f72, timestamp=1300111213321000)
>> 
>> => (column=00000001-0000-1000-0000-000000000000, value=757365723031,
>> 
>> timestamp=1300111213322000)
>> 
>> => (column=f064b480-495e-11e0-abc4-0024e89fa587, value=3135,
>> 
>> timestamp=1300111213561000)
>> 
>> 
>> 
>> 1 Row Returned.
>> 
>> 
>> 
>> the values of the index colums 00000000-0000-1000-0000-000000000000 and
>> 
>> 00000001-0000-1000-0000-000000000000 represent "testMandator" and and
>> 
>> "user01" as bytes
>> 
>> the third column is a randomly generated one with value "15" that are
>> 
>> inserted in GetTimeUUIDIndexedSlices app.
>> 
>> I attached both source codes, GetIndexedSlices and GetTimeUUIDIndexedSlices.
>> 
>> Currently the second index expression for the userid index in
>> 
>> GetTimeUUIDIndexedSlices.queryCf(...) method
>> 
>> 
>> 
>>             indexQuery.addEqualsExpression(asByteArray(MANDATOR_UUID), new
>> 
>> StringSerializer().toBytes(mandator));
>> 
>>         //indexQuery.addEqualsExpression(asByteArray(USERID_INDEX_UUID), new
>> 
>> StringSerializer().toBytes(dummyUserId));
>> 
>> 
>> 
>> is commented out, so the GetTimeUUIDIndexedSlices will run. Using one
>> 
>> IndexQuery works perfectly fine but as soon as I add a second eq, gt, gte or
>> 
>> lt expression I get an IndexOutOfBoundsException (see below).
>> 
>> 
>> 
>> This issue can be easily reproduced by
>> 
>> - downloading the zznate example
>> 
>> (https://github.com/zznate/hector-examples),
>> 
>> - mavenizing it to an eclipse project with "mvn clean eclipse:eclipse",
>> 
>> - importing it in eclipse and
>> 
>> - letting it run against a locally running cassandra instance (v0.7.3) which
>> 
>> has the default settings (no changes in the .yaml)
>> 
>> 
>> 
>> I hope that someone can help me with this issue ... after a couple of days
>> 
>> it's driving me bonkers.
>> 
>> 
>> 
>> Thx in advance,
>> 
>> Johannes
>> 
>> 
>> 
>> 
>> 
>> Exception:
>> 
>> ERROR 14:47:56,842 Error in ThreadPoolExecutor
>> 
>> java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
>> 
>>         at
>> 
>> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
>> 
>> bHandler.java:51)
>> 
>>         at
>> 
>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.
>> 
>> java:72)
>> 
>>         at
>> 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
>> 
>> utor.java:886)
>> 
>>         at
>> 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
>> 
>> .java:908)
>> 
>>         at java.lang.Thread.run(Thread.java:619)
>> 
>> Caused by: java.lang.IndexOutOfBoundsException: 6
>> 
>>         at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:121)
>> 
>>         at
>> 
>> org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(Ti
>> 
>> meUUIDType.java:56)
>> 
>>         at
>> 
>> org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
>> 
>> a:45)
>> 
>>         at
>> 
>> org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
>> 
>> a:29)
>> 
>>         at
>> 
>> org.apache.cassandra.db.ColumnFamilyStore.satisfies(ColumnFamilyStore
>> 
>> .java:1608)
>> 
>>         at
>> 
>> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java
>> 
>> :1552)
>> 
>>         at
>> 
>> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
>> 
>> bHandler.java:42)
>> 
>>         ... 4 more
>> 
>> ERROR 14:47:56,852 Fatal exception in thread Thread[ReadStage:14,5,main]
>> 
>> java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
>> 
>>         at
>> 
>> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
>> 
>> bHandler.java:51)
>> 
>>         at
>> 
>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.
>> 
>> java:72)
>> 
>>         at
>> 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
>> 
>> utor.java:886)
>> 
>>         at
>> 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
>> 
>> .java:908)
>> 
>> <GetIndexedSlices.java><GetTimeUUIDIndexedSlices.java>
>> 
>> 
>> 
>> 
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>> 
>> 
>> 
>> <GetTimeUUIDIndexedSlices.java>
>> 
>> 
>> 
>> 
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com

Mime
View raw message