cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-2367) Cleanup conversions between bytes and strings
Date Wed, 23 Mar 2011 17:28:06 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010239#comment-13010239
] 

Sylvain Lebresne commented on CASSANDRA-2367:
---------------------------------------------

There is also:
  * the avro schema (DEFINITION_SCHEMA_COLUMN_NAME) for mutation. I was encoded in UTF8 (in
Migration.java), but decoded using system encoding (in DefsTable.loadFromStorage(), since
decoded by ByteBufferUtil.string() with default charset).
  * In HintedHandOffManager, the combined table and cfName is encoded as UTF8 but decoded
with system encoding (once again through the use of BBUtil.string() with no specific charset.


> Cleanup conversions between bytes and strings
> ---------------------------------------------
>
>                 Key: CASSANDRA-2367
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2367
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>             Fix For: 0.7.5
>
>         Attachments: 0001-Cleanup-bytes-string-conversions.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> There is a bit of inconsistency in our conversions between ByteBuffers and Strings.
> For instance, ByteBufferUtil.string() uses as a default the java default charset, while
ByteBufferUtil.bytes(String) assumes UTF8. Moreover, a number of places in the code don't
use those functions and uses getBytes() directly. There again, we often encode with the default
charset but decode in UTF8 or the contrary.
> Using the default charset is probably a bad idea anyway, since this depends on the actual
system the node is running on and could lead to a stupid bug when running in heterogeneous
systems.
> This ticket proposes to always assume UTF8 all over the place (and tries to use the ByteBufferUtil
as much as possible to help with that).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message