cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paulo Motta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-11030) non-ascii characters incorrectly displayed/inserted on cqlsh on Windows
Date Tue, 19 Jan 2016 18:24:39 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-11030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107129#comment-15107129
] 

Paulo Motta commented on CASSANDRA-11030:
-----------------------------------------

There are two issues at play here. The first is that the default Windows terminal encoding
is not {{utf-8}}, so in order to display/input {{utf-8}} characters you must set the terminal
encoding (code page in Windows nomenclature) to {{cp65001}}, by issuing the command {{chcp
65001}} before starting cqlsh. The second issue is that there is no codec for {{cp65001}}
in python < 3.3 (this was fixed in issue [13216|https://bugs.python.org/issue13216] in
Python [3.3+|https://docs.python.org/dev/whatsnew/3.3.html#codecs]). A known workaround is
to register a copy of the {{utf-8}} codec to encode/decode {{cp65001}}.

So, if the platform is native windows (the issue doesn't happen on cygwin), and the encoding
is set to {{utf-8}} but the terminal encoding is not {{cp65001}}, a warning is print for the
user to change its codepoint to {{cp65001}} to support {{utf-8}} encoding. Furthermore, if
the {{cp650001}} is the default encoding and the python version is less than 3.3, the {{utf-8}}
codec is registered as {{cp65001}}.

||2.2||3.0||3.3||trunk||
|[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:2.2-11030]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-11030]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.3...pauloricardomg:3.3-11030]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-11030]|
|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-11030-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-11030-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.3-11030-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-11030-testall/lastCompletedBuild/testReport/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-11030-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-11030-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.3-11030-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-11030-dtest/lastCompletedBuild/testReport/]|

Below is a sample execution with different encoding variations (default vs utf-8/cp65001):

{noformat}
C:\Users\Paulo\Repositories\cassandra [cassandra-2.2 +8 ~1 -0 !]> bin\cqlsh.bat
Connected to test at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 2.2.4-SNAPSHOT | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
cqlsh> select * from bla.test;

 bla
--------------
 joπo ßlcides
          bla
         nπoτ

(3 rows)
cqlsh> select * from bla.test where bla = 'nãoç';

 bla
-----

(0 rows)
cqlsh> exit;
C:\Users\Paulo\Repositories\cassandra [cassandra-2.2 +8 ~1 -0 !]> bin\cqlsh.bat --encoding
utf-8

WARNING: console codepage must be set to cp65001 to support utf-8 encoding on Windows platforms.
If you experience encoding problems, change your console codepage with 'chcp 65001' before
starting cqlsh.

Connected to test at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 2.2.4-SNAPSHOT | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
cqlsh> select * from bla.test;

 bla
--------------
 joão álcides
          bla
         nãoç

(3 rows)
cqlsh> select * from bla.test where bla = 'nãoç';
Traceback (most recent call last):
  File "C:\Users\Paulo\Repositories\cassandra\bin\\cqlsh.py", line 1044, in get_input_line
    self.lastcmd = raw_input(prompt).decode(self.encoding)
  File "C:\tools\python2\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x87 in position 39: invalid start byte

WARNING: console codepage must be set to cp65001 to support utf-8 encoding on Windows platforms.
If you experience encoding problems, change your console codepage with 'chcp 65001' before
starting cqlsh.

cqlsh> exit;
C:\Users\Paulo\Repositories\cassandra [cassandra-2.2 +8 ~1 -0 !]> chcp 65001
Active code page: 65001
C:\Users\Paulo\Repositories\cassandra [cassandra-2.2 +8 ~1 -0 !]> bin\cqlsh.bat --encoding
utf-8
Connected to test at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 2.2.4-SNAPSHOT | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
cqlsh> select * from bla.test;

 bla
--------------
 joão álcides
          bla
         nãoç

(3 rows)
cqlsh> select * from bla.test where bla = 'nãoç';

 bla
------
 nãoç

(1 rows)
cqlsh> insert into bla.test (bla ) VALUES ( 'ãnothér' );
cqlsh> select * from bla.test where bla = 'ãnothér';

 bla
---------
 ãnothér

(1 rows)
cqlsh> exit;    
{noformat}

[~Stefania] would you mind reviewing? Would you have a Windows10 box to test it? I tested
only on win7 and it works correctly.

> non-ascii characters incorrectly displayed/inserted on cqlsh on Windows
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-11030
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11030
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Paulo Motta
>            Assignee: Paulo Motta
>            Priority: Minor
>              Labels: cqlsh, windows
>
> {noformat}
> C:\Users\Paulo\Repositories\cassandra [2.2-10948 +6 ~1 -0 !]> .\bin\cqlsh.bat --encoding
utf-8
> Connected to test at 127.0.0.1:9042.
> [cqlsh 5.0.1 | Cassandra 2.2.4-SNAPSHOT | CQL spec 3.3.1 | Native protocol v4]
> Use HELP for help.
> cqlsh> INSERT INTO bla.test (bla ) VALUES  ('não') ;
> cqlsh> select * from bla.test;
>  bla
> -----
>  n?o
> (1 rows)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message