cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tyler Hobbs (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-8101) Invalid ASCII and UTF-8 chars not rejected in CQL string literals
Date Fri, 10 Oct 2014 21:49:34 GMT


Tyler Hobbs updated CASSANDRA-8101:
    Attachment: 8101.txt

The ascii problem was exactly as the descriptions says, and the changes in AsciiType fix that.

When it comes to UTF8, the issue runs deeper.  There ended up being a netty bug (which I will
open a ticket for shortly) that caused characters outside of the specified charset to be replaced
(by \uFFFD).  Since the native protocol specifies that all strings must be UTF-8, the validation
happens in CBUtil.readString().

I've pushed a [dtest|] to cover
both cases.  In addition to the attached patch, there's also a [branch|].

> Invalid ASCII and UTF-8 chars not rejected in CQL string literals
> -----------------------------------------------------------------
>                 Key: CASSANDRA-8101
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Tyler Hobbs
>            Assignee: Tyler Hobbs
>            Priority: Critical
>             Fix For: 2.0.11, 2.1.1
>         Attachments: 8101.txt
> When processing CQL string literals, we ultimately use {{String.getBytes(Charset)}},
which has the following note:
> {quote}
> This method always replaces malformed-input and unmappable-character sequences with this
charset's default replacement byte array. The CharsetEncoder class should be used when more
control over the encoding process is required.
> {quote}
> So, if we insert a non-ASCII character into an ascii string literal, it will be replaced
with a {{?}} char.  Something similar happens for UTF-8.
> For example:
> {noformat}
> cqlsh:ks1> create table badstrings (a int primary key, b ascii);
> cqlsh:ks1> insert into badstrings (a, b) VALUES ( 0, 'ΎΔδϠ');
> cqlsh:ks1> select * from badstrings;
>  a | b
> ---+------
>  0 | ????
> {noformat}

This message was sent by Atlassian JIRA

View raw message