cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Oksana Danylyshyn (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8166) Not all data is loaded to Pig using CqlNativeStorage
Date Wed, 22 Oct 2014 17:41:34 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180224#comment-14180224
] 

Oksana Danylyshyn commented on CASSANDRA-8166:
----------------------------------------------

Please also note, that it was tried on version 2.1.0, 
All the data is loading fine using CqlStorage with versions prior to 2.1.0-rc-6, 
and since 2.1.0-rc6 CqlStorage started to produce errors: Unable to find inputformat class
'org.apache.cassandra.hadoop.cql3.CqlPagingInputFormat', 
however CqlNativeStorage works without errors, but does not return all the data.
Issue is also reproduced for tables with non-compound keys.

> Not all data is loaded to Pig using CqlNativeStorage
> ----------------------------------------------------
>
>                 Key: CASSANDRA-8166
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8166
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>            Reporter: Oksana Danylyshyn
>         Attachments: sorted.zip
>
>
> Not all the data from Cassandra table is loaded into Pig using CqlNativeStorage function.
> Steps to reproduce:
> cql3 create table statement:
> CREATE TABLE time_bucket_step (
>   key varchar,
>   object_id varchar,
>   value varchar,
>   PRIMARY KEY (key, object_id)
> );
> Loading and saving data to Cassandra ("sorted" file is in the attachment):
> time_bucket_step = load 'sorted' using PigStorage('\t','-schema');
> records = foreach time_bucket_step
>   generate
>     TOTUPLE(TOTUPLE('key', key),TOTUPLE('object_id', object_id)),
>     TOTUPLE(value);
> store records into 'cql://socialdata/time_bucket_step?output_query=UPDATE+socialdata.time_bucket_step+set+value+%3D+%3F'
using org.apache.cassandra.hadoop.pig.CqlNativeStorage();
> Results:
> Input(s):
> Successfully read 139026 records (11115817 bytes) from: "hdfs://.../sorted"
> Output(s):
> Successfully stored 139026 records in: "cql://socialdata/time_bucket_step?output_query=UPDATE+socialdata.time_bucket_step+set+value+%3D+%3F"
> Loading data from Cassandra: (note that not all data are read)
> time_bucket_step_cass = load 'cql://socialdata/time_bucket_step' using org.apache.cassandra.hadoop.pig.CqlNativeStorage();
> store time_bucket_step_cass into 'time_bucket_step_cass' using PigStorage('\t','-schema');
> Results:
> Input(s):
> Successfully read 80727 records (20068 bytes) from: "cql://socialdata/time_bucket_step"
> Output(s):
> Successfully stored 80727 records (2098178 bytes) in: "hdfs://..../time_bucket_step_cass"
> Actual: only 80727 of 139026 records were loaded
> Expected: All data should be loaded



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message