cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Artem Aliev (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-8577) Values of set types not loading correctly into Pig
Date Fri, 09 Jan 2015 12:09:34 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14270942#comment-14270942
] 

Artem Aliev edited comment on CASSANDRA-8577 at 1/9/15 12:08 PM:
-----------------------------------------------------------------

to reproduce the bug with unit tests:
1 replace ./build/lib/jars/cassandra-driver-core-2.0.5.jar with cassandra-driver-core-2.1.3.jar
2 run pig unit tests 
 ant pig-test -Dtest.name=CqlTableDataTypeTest
{code}
….
   [junit] org.apache.cassandra.serializers.MarshalException: Unexpected extraneous bytes
after list value
    [junit] at org.apache.cassandra.serializers.ListSerializer.deserializeForNativeProtocol(ListSerializer.java:104)
    [junit] at org.apache.cassandra.serializers.ListSerializer.deserializeForNativeProtocol(ListSerializer.java:27)
    [junit] at org.apache.cassandra.hadoop.pig.AbstractCassandraStorage.cassandraToObj(AbstractCassandraStorage.java:796)
    [junit] at org.apache.cassandra.hadoop.pig.CqlStorage.cqlColumnToObj(CqlStorage.java:195)
    [junit] at org.apache.cassandra.hadoop.pig.CqlNativeStorage.getNext(CqlNativeStorage.java:106)
    [junit] at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
    [junit] at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
    [junit] at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
    [junit] at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
    [junit] at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    [junit] at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    [junit] at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
….
{code}

Cassandra 2.1 is shipped with java driver 2.0, that used V2 native protocol. The java driver
2.1 is available and it use V3 native protocol.
The collection serialisation is changed in V3. Current implementation of pig reader has harcoded
version 1 for deserialisation, as result of incomplete fix of CASSANDRA-7287.
The version 1 should be used in cql-over-thrift deprecated API only. CqlNativeStorage use
java driver protocol. So the patch passes the negotiated by java driver serialisation protocol
to deserialiser in case CqlNativeStorage is used. I also add optional ‘cassandra.input.native.protocol.version’
parameter to force the protocol version, just in case.



was (Author: artem.aliev):
to reproduce the bug with unit tests:
1 replace ./build/lib/jars/cassandra-driver-core-2.0.5.jar with cassandra-driver-core-2.0.5.jar
2 run pig unit tests 
 ant pig-test -Dtest.name=CqlTableDataTypeTest
{code}
….
   [junit] org.apache.cassandra.serializers.MarshalException: Unexpected extraneous bytes
after list value
    [junit] at org.apache.cassandra.serializers.ListSerializer.deserializeForNativeProtocol(ListSerializer.java:104)
    [junit] at org.apache.cassandra.serializers.ListSerializer.deserializeForNativeProtocol(ListSerializer.java:27)
    [junit] at org.apache.cassandra.hadoop.pig.AbstractCassandraStorage.cassandraToObj(AbstractCassandraStorage.java:796)
    [junit] at org.apache.cassandra.hadoop.pig.CqlStorage.cqlColumnToObj(CqlStorage.java:195)
    [junit] at org.apache.cassandra.hadoop.pig.CqlNativeStorage.getNext(CqlNativeStorage.java:106)
    [junit] at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
    [junit] at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
    [junit] at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
    [junit] at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
    [junit] at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    [junit] at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    [junit] at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
….
{code}

Cassandra 2.1 is shipped with java driver 2.0, that used V2 native protocol. The java driver
2.1 is available and it use V3 native protocol.
The collection serialisation is changed in V3. Current implementation of pig reader has harcoded
version 1 for deserialisation, as result of incomplete fix of CASSANDRA-7287.
The version 1 should be used in cql-over-thrift deprecated API only. CqlNativeStorage use
java driver protocol. So the patch passes the negotiated by java driver serialisation protocol
to deserialiser in case CqlNativeStorage is used. I also add optional ‘cassandra.input.native.protocol.version’
parameter to force the protocol version, just in case.


> Values of set types not loading correctly into Pig
> --------------------------------------------------
>
>                 Key: CASSANDRA-8577
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8577
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Oksana Danylyshyn
>            Assignee: Brandon Williams
>             Fix For: 2.1.3
>
>         Attachments: cassandra-2.1-8577.txt
>
>
> Values of set types are not loading correctly from Cassandra (cql3 table, Native protocol
v3) into Pig using CqlNativeStorage. 
> When using Cassandra version 2.1.0 only empty values are loaded, and for newer versions
(2.1.1 and 2.1.2) the following error is received: 
> org.apache.cassandra.serializers.MarshalException: Unexpected extraneous bytes after
set value
> at org.apache.cassandra.serializers.SetSerializer.deserializeForNativeProtocol(SetSerializer.java:94)
> Steps to reproduce:
> {code}cqlsh:socialdata> CREATE TABLE test (
>                  key varchar PRIMARY KEY,
>                  tags set<varchar>
>                );
> cqlsh:socialdata> insert into test (key, tags) values ('key', {'Running', 'onestep4red',
'running'});
> cqlsh:socialdata> select * from test;
>  key | tags
> -----+---------------------------------------
>  key | {'Running', 'onestep4red', 'running'}
> (1 rows){code}
> With version 2.1.0:
> {code}grunt> data = load 'cql://socialdata/test' using org.apache.cassandra.hadoop.pig.CqlNativeStorage();
> grunt> dump data;
> (key,()){code}
> With version 2.1.2:
> {code}grunt> data = load 'cql://socialdata/test' using org.apache.cassandra.hadoop.pig.CqlNativeStorage();
> grunt> dump data;
> org.apache.cassandra.serializers.MarshalException: Unexpected extraneous bytes after
set value
>   at org.apache.cassandra.serializers.SetSerializer.deserializeForNativeProtocol(SetSerializer.java:94)
>   at org.apache.cassandra.serializers.SetSerializer.deserializeForNativeProtocol(SetSerializer.java:27)
>   at org.apache.cassandra.hadoop.pig.AbstractCassandraStorage.cassandraToObj(AbstractCassandraStorage.java:796)
>   at org.apache.cassandra.hadoop.pig.CqlStorage.cqlColumnToObj(CqlStorage.java:195)
>   at org.apache.cassandra.hadoop.pig.CqlNativeStorage.getNext(CqlNativeStorage.java:106)
>   at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
>   at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
>   at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>   at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212){code}
> Expected result:
> {code}(key,(Running,onestep4red,running)){code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message