incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: AW: problems while TimeUUIDType-index-querying with two expressions
Date Wed, 16 Mar 2011 00:15:02 GMT
Have attached a patch to https://issues.apache.org/jira/browse/CASSANDRA-2328 

Can you give it a try ? You should not get a InvalidRequestException when you send an invalid
name or value in the query expression. 

Aaron

On 16 Mar 2011, at 10:30, aaron morton wrote:

> Will have the Jira I created finished soon, it's a legitimate issue we should be validating
the column names and values when a ger_indexed_slice() request is sent. The error in your
original email shows that. 
> 
> WRT your code example. You are using the TimeUUID Validator for the column name when
creating the index expression, but are using a string serialiser for the value...
> IndexedSlicesQuery<String, UUID, String> indexQuery = HFactory
> 		.createIndexedSlicesQuery(keyspace,
> 				stringSerializer, UUID_SERIALIZER, stringSerializer);
>         indexQuery.addEqualsExpression(MANDATOR_UUID, mandator);
> 
> But your schema is saying it is a bytes type...
> 
> column_metadata=[{column_name: 00000000-0000-1000-0000-000000000000, validation_class:
BytesType, index_name: mandatorIndex, index_type: KEYS}, {column_name: 00000001-0000-1000-0000-000000000000,
validation_class: BytesType, index_name: useridIndex, index_type: KEYS}];"On 15 Mar 2011,
at 22:41, 
> 
> Once I have the patch can you apply it and run your test again ?
> 
> You may also want to ask on the Hector list if it automagically check you are using the
correct types when creating an IndexedSlicesQuery. 
> 
> Aaron
> 
> Roland Gude wrote:
> 
>> Forgot to attach the source code… here it comes
>>  
>> Von: Roland Gude [mailto:roland.gude@yoochoose.com] 
>> Gesendet: Dienstag, 15. März 2011 10:39
>> An: user@cassandra.apache.org
>> Betreff: AW: problems while TimeUUIDType-index-querying with two expressions
>>  
>> Actually its not the column values that should be UUIDs in our case, but the column
keys. The CF uses TimeUUID ordering and the values are just some ByteArrays. Even with changing
the code to use UUIDSerializer instead of serializing the UUIDs manually the issue still exists.
>>  
>> As far as I can see, there is nothing wrong with the IndexExpression.
>> using two Index expressions with key=TimedUUID and Value=anything does not work
>> using one index expression (any one of the other two) alone does work fine.
>>  
>> I refactored Johannes code into a junit testcase. It  needs the cluster configured
as described in Johannes mail.
>> There are three cases. Two with one of the indexExpressions and one with both index
expression. The one with Both IndexExpression will never finish and youz will see the exception
in the Cassandra logs.
>>  
>> Bye,
>> roland
>>  
>> Von: aaron morton [mailto:aaron@thelastpickle.com] 
>> Gesendet: Dienstag, 15. März 2011 07:54
>> An: user@cassandra.apache.org
>> Cc: Juergen Link; Roland Gude; hermes@datastax.com
>> Betreff: Re: problems while TimeUUIDType-index-querying with two expressions
>>  
>> Perfectly reasonable, created https://issues.apache.org/jira/browse/CASSANDRA-2328
>>  
>> Aaron
>> On 15 Mar 2011, at 16:52, Jonathan Ellis wrote:
>>  
>> 
>> Sounds like we should send an InvalidRequestException then.
>> 
>> On Mon, Mar 14, 2011 at 8:06 PM, aaron morton <aaron@thelastpickle.com> wrote:
>> 
>> It's failing to when comparing two TimeUUID values because on of them is not
>> properly formatted. In this case it's comparing a stored value with the
>> value passed in the get_indexed_slice() query expression.
>> I'm going to assume it's the value passed for the expression.
>> When you create the IndexedSlicesQuery this is incorrect
>> IndexedSlicesQuery<String, byte[], byte[]> indexQuery = HFactory
>> .createIndexedSlicesQuery(keyspace,
>> stringSerializer, bytesSerializer, bytesSerializer);
>> Use a UUIDSerializer for the last param and then pass the UUID you want to
>> build the expressing. Rather than the string/byte thing you are passing
>> Hope that helps.
>> Aaron
>> On 15 Mar 2011, at 04:17, Johannes Hoerle wrote:
>>  
>> Hi all,
>>  
>> in order to improve our queries, we started to use IndexedSliceQueries from
>> the hector project (https://github.com/zznate/hector-examples). I followed
>> the instructions for creating IndexedSlicesQuery with
>> GetIndexedSlices.java.
>> I created the corresponding CF with in a keyspace called “Keyspace1” (
>> “create keyspace  Keyspace1;”) with:
>> "create column family Indexed1 with column_type='Standard' and
>> comparator='UTF8Type' and keys_cached=200000 and read_repair_chance=1.0 and
>> rows_cached=20000 and column_metadata=[{column_name: birthdate,
>> validation_class: LongType, index_name: dateIndex, index_type:
>> KEYS},{column_name: birthmonth, validation_class: LongType, index_name:
>> monthIndex, index_type: KEYS}];"
>> and the example GetIndexedSlices.java worked fine.
>>  
>> Output of CF Indexed1:
>> ---------------------------------------
>> [default@Keyspace1] list Indexed1;
>> Using default limit of 100
>> -------------------
>> RowKey: fake_key_12
>> => (column=birthdate, value=1974, timestamp=1300110485826059)
>> => (column=birthmonth, value=0, timestamp=1300110485826060)
>> => (column=fake_column_0, value=66616b655f76616c75655f305f3132,
>> timestamp=1300110485826056)
>> => (column=fake_column_1, value=66616b655f76616c75655f315f3132,
>> timestamp=1300110485826057)
>> => (column=fake_column_2, value=66616b655f76616c75655f325f3132,
>> timestamp=1300110485826058)
>> -------------------
>> RowKey: fake_key_8
>> => (column=birthdate, value=1974, timestamp=1300110485826039)
>> => (column=birthmonth, value=8, timestamp=1300110485826040)
>> => (column=fake_column_0, value=66616b655f76616c75655f305f38,
>> timestamp=1300110485826036)
>> => (column=fake_column_1, value=66616b655f76616c75655f315f38,
>> timestamp=1300110485826037)
>> => (column=fake_column_2, value=66616b655f76616c75655f325f38,
>> timestamp=1300110485826038)
>> -------------------
>> ....
>>  
>>  
>> Now to the problem:
>> As we have another column format in our cluster (using TimeUUIDType as
>> comparator in CF definition) I adapted the application to our schema on a
>> cassandra-0.7.3 cluster.
>> We use a manually defined UUID for a mandator id index
>> (00000000-0000-1000-0000-000000000000) and another one for a userid index
>> (00000001-0000-1000-0000-000000000000). It can be created with:
>> "create column family ByUser with column_type='Standard' and
>> comparator='TimeUUIDType' and keys_cached=200000 and read_repair_chance=1.0
>> and rows_cached=20000 and column_metadata=[{column_name:
>> 00000000-0000-1000-0000-000000000000, validation_class: BytesType,
>> index_name: mandatorIndex, index_type: KEYS}, {column_name:
>> 00000001-0000-1000-0000-000000000000, validation_class: BytesType,
>> index_name: useridIndex, index_type: KEYS}];"
>>  
>>  
>> which looks in the cluster using cassandra-cli like this:
>>  
>> [default@Keyspace1] describe keyspace;
>> Keyspace: Keyspace1:
>>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>>     Replication Factor: 1
>>   Column Families:
>>     ColumnFamily: ByUser
>>       Columns sorted by: org.apache.cassandra.db.marshal.TimeUUIDType
>>       Row cache size / save period: 20000.0/0
>>       Key cache size / save period: 200000.0/14400
>>       Memtable thresholds: 0.2953125/63/1440
>>       GC grace seconds: 864000
>>       Compaction min/max thresholds: 4/32
>>       Read repair chance: 0.01
>>       Built indexes: [ByUser.mandatorIndex, ByUser.useridIndex]
>>       Column Metadata:
>>         Column Name: 00000001-0000-1000-0000-000000000000
>>           Validation Class: org.apache.cassandra.db.marshal.BytesType
>>           Index Name: useridIndex
>>           Index Type: KEYS
>>         Column Name: 00000000-0000-1000-0000-000000000000
>>           Validation Class: org.apache.cassandra.db.marshal.BytesType
>>           Index Name: mandatorIndex
>>           Index Type: KEYS
>>     ColumnFamily: Indexed1
>>       Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>>       Row cache size / save period: 20000.0/0
>>       Key cache size / save period: 200000.0/14400
>>       Memtable thresholds: 0.2953125/63/1440
>>       GC grace seconds: 864000
>>       Compaction min/max thresholds: 4/32
>>       Read repair chance: 0.01
>>       Built indexes: [Indexed1.dateIndex, Indexed1.monthIndex]
>>       Column Metadata:
>>         Column Name: birthmonth (birthmonth)
>>           Validation Class: org.apache.cassandra.db.marshal.LongType
>>           Index Name: monthIndex
>>           Index Type: KEYS
>>         Column Name: birthdate (birthdate)
>>           Validation Class: org.apache.cassandra.db.marshal.LongType
>>           Index Name: dateIndex
>>           Index Type: KEYS
>> [default@Keyspace1] list ByUser;
>> Using default limit of 100
>> -------------------
>> RowKey: testMandator!!user01
>> => (column=00000000-0000-1000-0000-000000000000,
>> value=746573744d616e6461746f72, timestamp=1300111213321000)
>> => (column=00000001-0000-1000-0000-000000000000, value=757365723031,
>> timestamp=1300111213322000)
>> => (column=f064b480-495e-11e0-abc4-0024e89fa587, value=3135,
>> timestamp=1300111213561000)
>>  
>> 1 Row Returned.
>>  
>> the values of the index colums 00000000-0000-1000-0000-000000000000 and
>> 00000001-0000-1000-0000-000000000000 represent "testMandator" and and
>> "user01" as bytes
>> the third column is a randomly generated one with value "15" that are
>> inserted in GetTimeUUIDIndexedSlices app.
>> I attached both source codes, GetIndexedSlices and GetTimeUUIDIndexedSlices.
>> Currently the second index expression for the userid index in
>> GetTimeUUIDIndexedSlices.queryCf(...) method
>>  
>>             indexQuery.addEqualsExpression(asByteArray(MANDATOR_UUID), new
>> StringSerializer().toBytes(mandator));
>>         //indexQuery.addEqualsExpression(asByteArray(USERID_INDEX_UUID), new
>> StringSerializer().toBytes(dummyUserId));
>>  
>> is commented out, so the GetTimeUUIDIndexedSlices will run. Using one
>> IndexQuery works perfectly fine but as soon as I add a second eq, gt, gte or
>> lt expression I get an IndexOutOfBoundsException (see below).
>>  
>> This issue can be easily reproduced by
>> - downloading the zznate example
>> (https://github.com/zznate/hector-examples),
>> - mavenizing it to an eclipse project with "mvn clean eclipse:eclipse",
>> - importing it in eclipse and
>> - letting it run against a locally running cassandra instance (v0.7.3) which
>> has the default settings (no changes in the .yaml)
>>  
>> I hope that someone can help me with this issue ... after a couple of days
>> it's driving me bonkers.
>>  
>> Thx in advance,
>> Johannes
>>  
>>  
>> Exception:
>> ERROR 14:47:56,842 Error in ThreadPoolExecutor
>> java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
>>         at
>> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
>> bHandler.java:51)
>>         at
>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.
>> java:72)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
>> utor.java:886)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
>> .java:908)
>>         at java.lang.Thread.run(Thread.java:619)
>> Caused by: java.lang.IndexOutOfBoundsException: 6
>>         at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:121)
>>         at
>> org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(Ti
>> meUUIDType.java:56)
>>         at
>> org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
>> a:45)
>>         at
>> org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
>> a:29)
>>         at
>> org.apache.cassandra.db.ColumnFamilyStore.satisfies(ColumnFamilyStore
>> .java:1608)
>>         at
>> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java
>> :1552)
>>         at
>> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
>> bHandler.java:42)
>>         ... 4 more
>> ERROR 14:47:56,852 Fatal exception in thread Thread[ReadStage:14,5,main]
>> java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
>>         at
>> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
>> bHandler.java:51)
>>         at
>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.
>> java:72)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
>> utor.java:886)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
>> .java:908)
>> <GetIndexedSlices.java><GetTimeUUIDIndexedSlices.java>
>>  
>> 
>> 
>> 
>> -- 
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>>  
>> <GetTimeUUIDIndexedSlices.java>
> 


Mime
View raw message