cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Johnny Miller (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-8790) Improve handling of non-printable unicode characters in text fields and CQLSH
Date Thu, 12 Feb 2015 09:46:11 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14317902#comment-14317902
] 

Johnny Miller edited comment on CASSANDRA-8790 at 2/12/15 9:45 AM:
-------------------------------------------------------------------

[~brandon.williams] I appreciate your point, but [~thobbs] example isn't the actual problem.
Its specifically non-printable unicode characters.

This isn't a problem, when your using the drivers - it is only when using CQLSH.

The problem is that if I start writing data like this via the drivers and then for some reason
I ever need to query it via CQLSH I will get the wrong answer when doing a select.

i.e. SELECT * from testunicode where id = 'state\u001Ccard' will not return any results via
CQLSH when the data does actually exist in my table and the same query via the drivers (java)
would actually return a result.

The workaround to use blobs is not great (IMHO) - its a shame to have model your data round
this specific CQLSH limitation

We should either provide some functionality to handle this this or alternatively error if
someone enters a non-printable unicode character in CQLSH as the answer we get back is incorrect
and likely to mislead people.


was (Author: johnny15676):
[~brandon.williams] I appreciate your point, but [~thobbs] example isn't the actual problem.
Its specifically non-printable unicode characters.

This isn't a problem, when your using the drivers - it is only when using CQLSH.

The problem is that if I start writing data like this via the drivers and then for some reason
I ever need to every query it via CQLSH I will get the wrong answer when doing a select.

i.e. SELECT * from testunicode where id = 'state\u001Ccard' will not return any results via
CQLSH when the data does actually exist in my table and the same query via the drivers (java)
would actually return a result.

The workaround to use blobs is not great (IMHO) - its a shame to have model your data round
this specific CQLSH limitation

We should either provide some functionality to handle this this or alternatively error if
someone enters a non-printable unicode character in CQLSH as the answer we get back is incorrect
and likely to mislead people.

> Improve handling of non-printable unicode characters in text fields and CQLSH
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8790
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8790
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Johnny Miller
>            Priority: Minor
>
> Currently to store a string/text value that contains a non-printable unicode character
and than subsequently be able to query it CQLSH I need to store the field as a blob via the
blobAsText and textAsBlob functions. 
> This is not really optimal - it would be better if CQLSH handled this rather than having
to model data around this limitation.
> For example:
> {code:title=Example Code|borderStyle=solid}
> String createTableCql = "CREATE TABLE IF NOT EXISTS test_ks.testunicode (id blob PRIMARY
KEY, inserted_on timestamp, lorem text)";
> session.execute(createTableCql);
> System.out.println("Table created.");		
> 		
> String dimension1 = "state";
> String dimension2 = "card";
> String key = dimension1 + '\u001C' + dimension2;
> Date now = new Date();
> String lorem = "Lorem ipsum dolor sit amet.";
> 		
> String insertcql = "INSERT INTO testunicode (id, inserted_on, lorem) VALUES (textAsBlob(?),
?, ?)";
> PreparedStatement ps = session.prepare(insertcql);
> BoundStatement bs = new BoundStatement(ps);
> bs.bind(key, now, lorem);
> session.execute(bs);
> System.out.println("Row inserted with key "+key);		
> 		
> String selectcql = "SELECT blobAsText(id) AS id, inserted_on, lorem FROM testunicode
WHERE id = textAsBlob(?)";
> PreparedStatement ps2 = session.prepare(selectcql);
> BoundStatement bs2 = new BoundStatement(ps2);
> bs2.bind(key);
> ResultSet results = session.execute(bs2);
> 		
> System.out.println("Got results...");
> 		
> for (Row row : results) {
> 		    System.out.println(String.format("%-30s\t%-20s\t%-20s", row.getString("id"), row.getDate("inserted_on"),
row.getString("lorem")));
> }
> {code}
> And to query via CQLSH:
> {code}
> select * from testunicode where id = 0x73746174651c63617264 ;
>  id                     | inserted_on              | lorem
> ------------------------+--------------------------+-----------------------------
>  0x73746174651c63617264 | 2015-02-11 20:32:20+0000 | Lorem ipsum dolor sit amet.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message