cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tyler Hobbs (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-8101) Invalid ASCII and UTF-8 chars not rejected in CQL string literals
Date Fri, 10 Oct 2014 21:49:34 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-8101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tyler Hobbs updated CASSANDRA-8101:
-----------------------------------
    Attachment: 8101.txt

The ascii problem was exactly as the descriptions says, and the changes in AsciiType fix that.

When it comes to UTF8, the issue runs deeper.  There ended up being a netty bug (which I will
open a ticket for shortly) that caused characters outside of the specified charset to be replaced
(by \uFFFD).  Since the native protocol specifies that all strings must be UTF-8, the validation
happens in CBUtil.readString().

I've pushed a [dtest|https://github.com/thobbs/cassandra-dtest/tree/CASSANDRA-8101] to cover
both cases.  In addition to the attached patch, there's also a [branch|https://github.com/thobbs/cassandra/tree/CASSANDRA-8101].

> Invalid ASCII and UTF-8 chars not rejected in CQL string literals
> -----------------------------------------------------------------
>
>                 Key: CASSANDRA-8101
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8101
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Tyler Hobbs
>            Assignee: Tyler Hobbs
>            Priority: Critical
>             Fix For: 2.0.11, 2.1.1
>
>         Attachments: 8101.txt
>
>
> When processing CQL string literals, we ultimately use {{String.getBytes(Charset)}},
which has the following note:
> {quote}
> This method always replaces malformed-input and unmappable-character sequences with this
charset's default replacement byte array. The CharsetEncoder class should be used when more
control over the encoding process is required.
> {quote}
> So, if we insert a non-ASCII character into an ascii string literal, it will be replaced
with a {{?}} char.  Something similar happens for UTF-8.
> For example:
> {noformat}
> cqlsh:ks1> create table badstrings (a int primary key, b ascii);
> cqlsh:ks1> insert into badstrings (a, b) VALUES ( 0, 'ΎΔδϠ');
> cqlsh:ks1> select * from badstrings;
>  a | b
> ---+------
>  0 | ????
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message