incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Miguel Angel Martin junquera <>
Subject Re: CqlStorage creates wrong schema for Pig
Date Mon, 26 Aug 2013 08:32:33 GMT
hi Chad .

I have this issue

I send a mail to user-pig-list and  I still i can resolve this, and I can
not  access to column values.
In this mail  I write some things that I try without results... and
information about this issue.

I hope  someOne reply  one comment, idea or  solution about  this issue or

I have reviewed the CqlStorage class in code cassandra 1.2.8  but i do not
have configure the environmetn to debug  and trace this issue.

Only  I find some comments like, but I do not understand at all.


 * A LoadStoreFunc for retrieving data from and storing data to Cassandra


 * A row from a standard CF will be returned as nested tuples:

 * (((key1, value1), (key2, value2)), ((name1, val1), (name2, val2))).

I you found some idea or solution, please post it


2013/8/23 Chad Johnston <>

> (I'm using Cassandra 1.2.8 and Pig 0.11.1)
> I'm loading some simple data from Cassandra into Pig using CqlStorage. The
> CqlStorage loader defines a Pig schema based on the Cassandra schema, but
> it seems to be wrong.
> If I do:
> data = LOAD 'cql://bookdata/books' USING CqlStorage();
> DESCRIBE data;
> I get this:
> data: {isbn: chararray,bookauthor: chararray,booktitle:
> chararray,publisher: chararray,yearofpublication: int}
> However, if I DUMP data, I get results like these:
> ((isbn,0425093387),(bookauthor,Georgette Heyer),(booktitle,Death in the
> Stocks),(publisher,Berkley Pub Group),(yearofpublication,1986))
> Clearly the results from Cassandra are key/value pairs, as would be
> expected. I don't know why the schema generated by CqlStorage() would be so
> different.
> This is really causing me problems trying to access the column values. I
> tried a naive approach of FLATTENing each tuple, then trying to access the
> values that way:
> flattened = FOREACH data GENERATE
>   FLATTEN(isbn),
>   FLATTEN(booktitle),
>   ...
> values = FOREACH flattened GENERATE
>   $1 AS ISBN,
>   $3 AS BookTitle,
>   ...
> As soon as I try to access field $5, Pig complains about the index being
> out of bounds.
> Is there a way to solve the schema/reality mismatch? Am I doing something
> wrong, or have I stumbled across a defect?
> Thanks,
> Chad

View raw message