cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Schrag (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-4421) Support cql3 table definitions in Hadoop InputFormat
Date Thu, 16 May 2013 02:29:17 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659149#comment-13659149
] 

Mike Schrag edited comment on CASSANDRA-4421 at 5/16/13 2:27 AM:
-----------------------------------------------------------------

I've tracked down the bug ... If the token value of the last row of the page == the end value
of the split, it ends up trying to fetch the next page using the query:

{code}SELECT * FROM [cf] WHERE token(key) > token(?) AND token(key) <= ? LIMIT 1000
ALLOW FILTERING{code}

If you fill this in ... Assume your split is 1000-2000, and the last row of the page happened
to actually be the max value 2000, that would be:

{code}SELECT * FROM [cf] WHERE token(key) > 2000 AND token(key) <= 2000 LIMIT 1000 ALLOW
FILTERING{code}

It looks like Cass freaks out here with the impossible predicate, and where it should be returning
an empty result, it ACTUALLY returns bogus values that fall outside the specified range. Once
you get a token outside of the split range, you're totally screwed, and everything goes off
the rails.
                
      was (Author: mikeschrag):
    I've tracked down the bug ... If the token value of the last row of the page == the end
value of the split, it ends up trying to fetch the next page using the query:

SELECT * FROM [cf] WHERE token(key) > token(?) AND token(key) <= ? LIMIT 1000 ALLOW
FILTERING

If you fill this in ... Assume your split is 1000-2000, and the last row of the page happened
to actually be the max value 2000, that would be:

SELECT * FROM [cf] WHERE token(key) > 2000 AND token(key) <= 2000 LIMIT 1000 ALLOW FILTERING

It looks like Cass freaks out here with the impossible predicate, and where it should be returning
an empty result, it ACTUALLY returns bogus values that fall outside the specified range. Once
you get a token outside of the split range, you're totally screwed, and everything goes off
the rails.

                  
> Support cql3 table definitions in Hadoop InputFormat
> ----------------------------------------------------
>
>                 Key: CASSANDRA-4421
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4421
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: API
>    Affects Versions: 1.1.0
>         Environment: Debian Squeeze
>            Reporter: bert Passek
>              Labels: cql3
>             Fix For: 1.2.5
>
>         Attachments: 4421-1.txt, 4421-2.txt, 4421.txt
>
>
> Hello,
> i faced a bug while writing composite column values and following validation on server
side.
> This is the setup for reproduction:
> 1. create a keyspace
> create keyspace test with strategy_class = 'SimpleStrategy' and strategy_options:replication_factor
= 1;
> 2. create a cf via cql (3.0)
> create table test1 (
>     a int,
>     b int,
>     c int,
>     primary key (a, b)
> );
> If i have a look at the schema in cli i noticed that there is no column metadata for
columns not part of primary key.
> create column family test1
>   with column_type = 'Standard'
>   and comparator = 'CompositeType(org.apache.cassandra.db.marshal.Int32Type,org.apache.cassandra.db.marshal.UTF8Type)'
>   and default_validation_class = 'UTF8Type'
>   and key_validation_class = 'Int32Type'
>   and read_repair_chance = 0.1
>   and dclocal_read_repair_chance = 0.0
>   and gc_grace = 864000
>   and min_compaction_threshold = 4
>   and max_compaction_threshold = 32
>   and replicate_on_write = true
>   and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
>   and caching = 'KEYS_ONLY'
>   and compression_options = {'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor'};
> Please notice the default validation class: UTF8Type
> Now i would like to insert value > 127 via cassandra client (no cql, part of mr-jobs).
Have a look at the attachement.
> Batch mutate fails:
> InvalidRequestException(why:(String didn't validate.) [test][test1][1:c] failed validation)
> A validator for column value is fetched in ThriftValidation::validateColumnData which
returns always the default validator which is UTF8Type as described above (The ColumnDefinition
for given column name "c" is always null)
> In UTF8Type there is a check for
> if (b > 127)
>    return false;
> Anyway, maybe i'm doing something wrong, but i used cql 3.0 for table creation. I assigned
data types to all columns, but i can not set values for a composite column because the default
validation class is used.
> I think the schema should know the correct validator even for composite columns. The
usage of the default validation class does not make sense.
> Best Regards 
> Bert Passek

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message