cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nabeel Shahzad (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-5180) NodeJS Thrift generated file incorrectly parses map/list/sets when doubles/floats are used
Date Tue, 22 Jan 2013 18:24:13 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-5180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Nabeel Shahzad updated CASSANDRA-5180:
--------------------------------------

    Description: 
When a double/float is used in a map (key or value), list, or set types, the decoding is done
as a utf8 string, which then incorrectly parses and adds extra bytes.

For example:

The bytes of a map <double, double> (this is coming out of the Thrift call)
{noformat}
00 01 00 08 3f f4 00 00 00 00 00 00 00 08 40 02 00 00 00 00 00 00
{noformat}

But after it's been parsed out from the field as UTF8:

{noformat}
00 01 00 08 3f 3f 00 00 00 00 00 00 00 08 40 02 00 00 00 00 00 00
{noformat}

As you can see there's an incorrect byte (the 3f where the f4, and an extra 00). For reference,
this value was map<double, double> = {1.25: 2.25}. This is the same behavior for floats.
The f4 translated to ASCII 247, which I believe isn't a valid utf8 code.

The actual value of the field becomes:
{noformat}
  value: '\u0000\u0002\u0000\b??\u0000\u0000\u0000\u0000\u0000\u0000\u0000\b@\u0002\u0000\u0000\u0000\u0000\u0000\u0000''
{noformat}

Where the \b = 8, ? = f4, ? = unknown char.

I have seen cases where there are *extra* bytes added in, which breaks the parsing based on
byte size:

{noformat}
00 01 00 08 40 24 48 72 c2 b0 20 c3 84 c2 9c 00 08 40 34 c3 bc c3 93 5a c2 85 c2 87 c2 94
{noformat}

Where the MAP value was {10.1415, 20.9876}. On a list, using either value also yields extra
bytes.

It seems to me when the "ftype" is parsed (int16) before the actual field, it's returning
a TYPE value of "11" (string) - instead of the proper value of a map/set/list. 

So this messes up any parsing based on the byte-length for the field, since there are a variable
number of extra bytes added, either to the key or value of the map, and any values of a list.

For reference, the table, and an insert example:

{noformat}
CREATE TABLE sample_map (
    id text PRIMARY KEY, 
    map_col_text map < text, text >, 
    map_col_int map < int, text >, 
    map_col_float map < float, float >,
    map_col_double map < double, double >
);

INSERT INTO sample_map (id, map_col_double) VALUES('DOUBLE_ROW_SINGLE', {10.1415: 20.9876});
{{noformat}}

Not sure if it matters, but this was using CQL3.

Versions:

{{noformat}}
cqlsh:orion> show version;
[cqlsh 2.3.0 | Cassandra 1.2.0 | CQL spec 3.0.0 | Thrift protocol 19.35.0]
{{noformat}}

{{noformat}}
$ thrift --version
Thrift version 0.9.0
{{noformat}}

{{noformat}}
 "name": "node-thrift",
  "description": "node.js bindings for the Apache Thrift RPC system",
  "homepage": "http://thrift.apache.org/",
  "repository": {
    "type": "svn",
    "url": "http://svn.apache.org/repos/asf/thrift/trunk/"
  },
  "version": "1.0.0-dev",
{{noformat}}


  was:
When a double/float is used in a map (key or value), list, or set types, the decoding is done
as a utf8 string, which then incorrectly parses and adds extra bytes.

For example:

The bytes of a map <double, double> (this is coming out of the Thrift call)
{noformat}
00 01 00 08 3f f4 00 00 00 00 00 00 00 08 40 02 00 00 00 00 00 00
{noformat}

But after it's been parsed out from the field as UTF8:

{noformat}
00 01 00 08 3f 3f 00 00 00 00 00 00 00 08 40 02 00 00 00 00 00 00
{noformat}

As you can see there's an incorrect byte (the 3f where the f4, and an extra 00). For reference,
this value was map<double, double> = {1.25: 2.25}. This is the same behavior for floats.
The f4 translated to ASCII 247, which I believe isn't a valid utf8 code.

The actual value of the field becomes:
{noformat}
  value: '\u0000\u0002\u0000\b??\u0000\u0000\u0000\u0000\u0000\u0000\u0000\b@\u0002\u0000\u0000\u0000\u0000\u0000\u0000''
{noformat}

Where the \b = 8, ? = f4, ? = unknown char.

I have seen cases where there are *extra* bytes added in, which breaks the parsing based on
byte size:

{noformat}
00 01 00 08 40 24 48 72 c2 b0 20 c3 84 c2 9c 00 08 40 34 c3 bc c3 93 5a c2 85 c2 87 c2 94
{noformat}

Where the MAP value was {10.1415, 20.9876}. On a list, using either value also yields extra
bytes.

It seems to me when the "ftype" is parsed (int16) before the actual field, it's returning
a TYPE value of "11" (string) - instead of the proper value of a map/set/list. 

So this messes up any parsing based on the byte-length for the field, since there are a variable
number of extra bytes added, either to the key or value of the map, and any values of a list.

For reference, the table, and an insert example:

{noformat}
CREATE TABLE sample_map (
    id text PRIMARY KEY, 
    map_col_text map < text, text >, 
    map_col_int map < int, text >, 
    map_col_float map < float, float >,
    map_col_double map < double, double >
);

INSERT INTO sample_map (id, map_col_double) VALUES('DOUBLE_ROW_SINGLE', {10.1415: 20.9876});
{{noformat}}

Not sure if it matters, but this was using CQL3


    
> NodeJS Thrift generated file incorrectly parses map/list/sets when doubles/floats are
used
> ------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-5180
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5180
>             Project: Cassandra
>          Issue Type: Bug
>          Components: API
>    Affects Versions: 1.2.0
>            Reporter: Nabeel Shahzad
>
> When a double/float is used in a map (key or value), list, or set types, the decoding
is done as a utf8 string, which then incorrectly parses and adds extra bytes.
> For example:
> The bytes of a map <double, double> (this is coming out of the Thrift call)
> {noformat}
> 00 01 00 08 3f f4 00 00 00 00 00 00 00 08 40 02 00 00 00 00 00 00
> {noformat}
> But after it's been parsed out from the field as UTF8:
> {noformat}
> 00 01 00 08 3f 3f 00 00 00 00 00 00 00 08 40 02 00 00 00 00 00 00
> {noformat}
> As you can see there's an incorrect byte (the 3f where the f4, and an extra 00). For
reference, this value was map<double, double> = {1.25: 2.25}. This is the same behavior
for floats. The f4 translated to ASCII 247, which I believe isn't a valid utf8 code.
> The actual value of the field becomes:
> {noformat}
>   value: '\u0000\u0002\u0000\b??\u0000\u0000\u0000\u0000\u0000\u0000\u0000\b@\u0002\u0000\u0000\u0000\u0000\u0000\u0000''
> {noformat}
> Where the \b = 8, ? = f4, ? = unknown char.
> I have seen cases where there are *extra* bytes added in, which breaks the parsing based
on byte size:
> {noformat}
> 00 01 00 08 40 24 48 72 c2 b0 20 c3 84 c2 9c 00 08 40 34 c3 bc c3 93 5a c2 85 c2 87 c2
94
> {noformat}
> Where the MAP value was {10.1415, 20.9876}. On a list, using either value also yields
extra bytes.
> It seems to me when the "ftype" is parsed (int16) before the actual field, it's returning
a TYPE value of "11" (string) - instead of the proper value of a map/set/list. 
> So this messes up any parsing based on the byte-length for the field, since there are
a variable number of extra bytes added, either to the key or value of the map, and any values
of a list.
> For reference, the table, and an insert example:
> {noformat}
> CREATE TABLE sample_map (
>     id text PRIMARY KEY, 
>     map_col_text map < text, text >, 
>     map_col_int map < int, text >, 
>     map_col_float map < float, float >,
>     map_col_double map < double, double >
> );
> INSERT INTO sample_map (id, map_col_double) VALUES('DOUBLE_ROW_SINGLE', {10.1415: 20.9876});
> {{noformat}}
> Not sure if it matters, but this was using CQL3.
> Versions:
> {{noformat}}
> cqlsh:orion> show version;
> [cqlsh 2.3.0 | Cassandra 1.2.0 | CQL spec 3.0.0 | Thrift protocol 19.35.0]
> {{noformat}}
> {{noformat}}
> $ thrift --version
> Thrift version 0.9.0
> {{noformat}}
> {{noformat}}
>  "name": "node-thrift",
>   "description": "node.js bindings for the Apache Thrift RPC system",
>   "homepage": "http://thrift.apache.org/",
>   "repository": {
>     "type": "svn",
>     "url": "http://svn.apache.org/repos/asf/thrift/trunk/"
>   },
>   "version": "1.0.0-dev",
> {{noformat}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message