ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Denis Magda (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IGNITE-3140) C++: UTF-16 surrogate symbols are not serialized properly
Date Tue, 17 May 2016 04:41:12 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15286028#comment-15286028
] 

Denis Magda commented on IGNITE-3140:
-------------------------------------

Igor,

The new serialization algorithm on Java side serializes all symbols that are bigger than 0x07FF
in 3 bytes.
It means that if there is a valid surrogate pair in a String like this one {{0xD801, 0xDC37}}
then the new algorithm will use 6 bytes to code it while basic UTF-8 coders/decoders will
use only 4 bytes. C++ side won't be able to properly deserialize {{0xD801, 0xDC37}} on its
side because it will be encoded in 6 bytes.

Try to serialize this String on C++ side. It should be encoded in 4 bytes while the new Java
algorithm encodes it in 6 bytes.
{noformat}        
str = new String(new char[] {0xD801, 0xDC37});
{noformat}



> C++: UTF-16 surrogate symbols are not serialized properly
> ---------------------------------------------------------
>
>                 Key: IGNITE-3140
>                 URL: https://issues.apache.org/jira/browse/IGNITE-3140
>             Project: Ignite
>          Issue Type: Bug
>          Components: platforms
>    Affects Versions: 1.5.0.final
>            Reporter: Denis Magda
>            Assignee: Vladimir Ozerov
>             Fix For: 1.6
>
>
> There is an issue with serialization of a surrogate symbol with {{BinaryMarshaller}}.
On Java side String's serialization logic was improved to support all the cases. Refer to
IGNITE-3098.
> C++ serialization logic has to be updated as well. Please refer to the algorithm located
in ignite-3098 branch in the following places:
> - {{BinaryUtils.utf8BytesToStr}} - serialization
> - {{BinaryUtils.strToUtf8Bytes}} - deserialization
> - {{IgniteSystemProperties.IGNITE_BINARY_MARSHALLER_USE_STRING_SERIALIZATION_VER_2}}
controls which version of serialization logic to use (old or new).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message