cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-5198) token () function automatically coerces types leading to confusing output
Date Wed, 30 Jan 2013 17:01:17 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sylvain Lebresne updated CASSANDRA-5198:
----------------------------------------

    Attachment: 0003-Respect-partitioner-type-for-Token-function.txt
                0002-Improve-printing-of-type-in-error-message.txt
                0001-Respect-CQL3-constant-types.txt

Attached 3 patches related to the proposed changes above:
# the first one adds proper type validation. In other word, it rejects a string value when
the column is int, or reject an int value when the column is a blob (instead of interpreting
it as an hex value which I'm pretty sure is counter-intuitive). This does however also reject
a string value when the column is a blob, because I'm far from convince than interpreting
the content of the string as an hex value is particularly intuitive. But to allow inserting
blobs, it allow a new type of hex constants (that must start with '0x'). In other words, if
b is a blob column:
{noformat}
UPDATE ... SET b = '00ff' ...
{noformat}
is not valid anymore, but
{noformat}
UPDATE ... SET b = 0x00ff ...
{noformat}
is. I note that the patch ain't tiny because it required a few refactoring here and there
to be done properly, but overall I think those refactor actually improve the
code.
# the second patch is mainly of cosmetic and make sure we use CQL3 type in CQL3 error message.
I.e. 'map<text, int>' rather than 'org.apache.cassandra.db.marshal.MapType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type.Int32Type)'.
# the third patch make sure we take the partitioner token type into account. So if your partitioner
is M3P you should provide a bigint value, if it's RP a varint one and if it's OPP a blob one.

Those patches don't add yet support for the token function in select clause that I talk above.
I also want to add conversion function that allow to say convert a string or a uuid to a blob,
but I want to refactor a bit the (currently ugly) handling of functions to do that so that
will follow later (and it can be done in another ticket).

                
> token () function automatically coerces types leading to confusing output
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-5198
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5198
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.2.1
>            Reporter: Edward Capriolo
>            Priority: Minor
>         Attachments: 0001-Respect-CQL3-constant-types.txt, 0002-Improve-printing-of-type-in-error-message.txt,
0003-Respect-partitioner-type-for-Token-function.txt
>
>
> This works as it should.
> {noformat}
> cqlsh:movies> select * from users where token (username) > token('') ;
>  username  | created_date | email | firstname | lastname | password
> -----------+--------------+-------+-----------+----------+----------
>     bsmith |         null |  null |       bob |    smith |     null
>  scapriolo |         null |  null |    stacey | capriolo |     null
>  ecapriolo |         null |  null |    edward | capriolo |     null
> cqlsh:movies> select * from users where token (username) > token('bsmith') ;
>  username  | created_date | email | firstname | lastname | password
> -----------+--------------+-------+-----------+----------+----------
>  scapriolo |         null |  null |    stacey | capriolo |     null
>  ecapriolo |         null |  null |    edward | capriolo |     null
> cqlsh:movies> select * from users where token (username) > token('scapriolo') ;
>  username  | created_date | email | firstname | lastname | password
> -----------+--------------+-------+-----------+----------+----------
>  ecapriolo |         null |  null |    edward | capriolo |     null
> {noformat}
> But look what happens when you supply numbers into the token function.
> {noformat}
> qlsh:movies> select * from users where token (username) > token(0) ;
>  username  | created_date | email | firstname | lastname | password
> -----------+--------------+-------+-----------+----------+----------
>  ecapriolo |         null |  null |    edward | capriolo |     null
> cqlsh:movies> select * from users where token (username) > token(1134314) ;
>  username  | created_date | email | firstname | lastname | password
> -----------+--------------+-------+-----------+----------+----------
>     bsmith |         null |  null |       bob |    smith |     null
>  scapriolo |         null |  null |    stacey | capriolo |     null
>  ecapriolo |         null |  null |    edward | capriolo |     null
> cqlsh:movies> select * from users where token (username) > token(113431431) ;
>  username  | created_date | email | firstname | lastname | password
> -----------+--------------+-------+-----------+----------+----------
>  scapriolo |         null |  null |    stacey | capriolo |     null
>  ecapriolo |         null |  null |    edward | capriolo |     null
> cqlsh:movies> select * from users where token (username) > token(1134) ;
>  username  | created_date | email | firstname | lastname | password
> -----------+--------------+-------+-----------+----------+----------
>  ecapriolo |         null |  null |    edward | capriolo |     null
> cqlsh:movies> select * from users where token (username) > token(1134434) ;
>  username  | created_date | email | firstname | lastname | password
> -----------+--------------+-------+-----------+----------+----------
>  scapriolo |         null |  null |    stacey | capriolo |     null
> {noformat}
> This does not make sense to me. The token function is apparently converting integers
to strings leading to seemingly unpredictable results. 
> However I find this syntax odd, I feel like I should be able to say 
> 'token(username) > 0 and token(username) < 10' because from a thrift side I can
page tokens or I can page keys. In this case, I guess, I am only able to page keys because
the token is not returned to the user.
> Is token 0 = ''? How do I arrive at the minimal token for and int column. 
> Should the token() function at least be smart enough to reject integers for string columns?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message