incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Miguel Angel Martin junquera <mianmarjun.mailingl...@gmail.com>
Subject Re: CqlStorage creates wrong schema for Pig
Date Mon, 02 Sep 2013 11:32:01 GMT
*good/nice job !!!*
*
*
*
*
*I'd testing with an udf only with  string schema type  this is better and
elaborate work..*
*
*
*Regads*


Miguel Angel Martín Junquera
Analyst Engineer.
miguelangel.martin@brainsins.com



2013/8/31 Chad Johnston <cjohnston@megatome.com>

> I threw together a quick UDF to work around this issue. It just extracts
> the value portion of the tuple while taking advantage of the CqlStorage
> generated schema to keep the type correct.
>
> You can get it here: https://github.com/iamthechad/cqlstorage-udf
>
> I'll see if I can find more useful information and open a defect, since
> that's what this seems to be.
>
> Chad
>
>
> On Fri, Aug 30, 2013 at 2:02 AM, Miguel Angel Martin junquera <
> mianmarjun.mailinglist@gmail.com> wrote:
>
>> I try this:
>>
>> *rows = LOAD
>> 'cql://keyspace1/test?page_size=1&split_size=4&where_clause=age%3D30' USING
>> CqlStorage();*
>>
>> *dump rows;*
>>
>> *ILLUSTRATE rows;*
>>
>> *describe rows;*
>>
>> *
>> *
>>
>> *values2= FOREACH rows GENERATE  TOTUPLE (id) as
>> (mycolumn:tuple(name,value));*
>>
>> *dump values2;*
>>
>> *describe values2;*
>> *
>> *
>>
>> But I get this results:
>>
>>
>>
>> -------------------------------------------------------------
>> | rows     | id:chararray   | age:int   | title:chararray   |
>> -------------------------------------------------------------
>> |          | (id, 6)        | (age, 30) | (title, QA)       |
>> -------------------------------------------------------------
>>
>> rows: {id: chararray,age: int,title: chararray}
>> 2013-08-30 09:54:37,831 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>> ERROR 1031: Incompatable field schema: left is
>> "tuple_0:tuple(mycolumn:tuple(name:bytearray,value:bytearray))", right is
>> "org.apache.pig.builtin.totuple_id_1:tuple(id:chararray)"
>>
>>
>>
>>
>>
>> or
>>
>>
>>
>> ....
>>
>> *values2= FOREACH rows GENERATE  TOTUPLE (id) ;*
>> *dump values2;*
>> *describe values2;*
>>
>>
>>
>>
>> and  the results are:
>>
>>
>> ...
>> (((id,6)))
>> (((id,5)))
>> values2: {org.apache.pig.builtin.totuple_id_8: (id: chararray)}
>>
>>
>>
>> Aggg!!!!!
>>
>>
>> *
>> *
>>
>>
>>
>> Miguel Angel Martín Junquera
>> Analyst Engineer.
>> miguelangel.martin@brainsins.com
>>
>>
>>
>> 2013/8/26 Miguel Angel Martin junquera <mianmarjun.mailinglist@gmail.com>
>>
>>> hi Chad .
>>>
>>> I have this issue
>>>
>>> I send a mail to user-pig-list and  I still i can resolve this, and I
>>> can not  access to column values.
>>> In this mail  I write some things that I try without results... and
>>> information about this issue.
>>>
>>>
>>>
>>> http://mail-archives.apache.org/mod_mbox/pig-user/201308.mbox/%3CCAJeG_hQ9S2Po3_XytZX5Xki4J1maO8q26jYdG2Wndy_KYiv9CQ@mail.gmail.com%3E
>>>
>>>
>>>
>>> I hope  someOne reply  one comment, idea or  solution about  this issue
>>> or bug.
>>>
>>>
>>> I have reviewed the CqlStorage class in code cassandra 1.2.8  but i do
>>> not have configure the environmetn to debug  and trace this issue.
>>>
>>> Only  I find some comments like, but I do not understand at all.
>>>
>>>
>>> /**
>>>
>>>  * A LoadStoreFunc for retrieving data from and storing data to
>>> Cassandra
>>>
>>>  *
>>>
>>>  * A row from a standard CF will be returned as nested tuples:
>>>
>>>  * (((key1, value1), (key2, value2)), ((name1, val1), (name2, val2))).
>>>  */
>>>
>>>
>>> I you found some idea or solution, please post it
>>>
>>> thanks
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> 2013/8/23 Chad Johnston <cjohnston@megatome.com>
>>>
>>>> (I'm using Cassandra 1.2.8 and Pig 0.11.1)
>>>>
>>>> I'm loading some simple data from Cassandra into Pig using CqlStorage.
>>>> The CqlStorage loader defines a Pig schema based on the Cassandra schema,
>>>> but it seems to be wrong.
>>>>
>>>> If I do:
>>>>
>>>> data = LOAD 'cql://bookdata/books' USING CqlStorage();
>>>> DESCRIBE data;
>>>>
>>>> I get this:
>>>>
>>>> data: {isbn: chararray,bookauthor: chararray,booktitle:
>>>> chararray,publisher: chararray,yearofpublication: int}
>>>>
>>>> However, if I DUMP data, I get results like these:
>>>>
>>>> ((isbn,0425093387),(bookauthor,Georgette Heyer),(booktitle,Death in the
>>>> Stocks),(publisher,Berkley Pub Group),(yearofpublication,1986))
>>>>
>>>> Clearly the results from Cassandra are key/value pairs, as would be
>>>> expected. I don't know why the schema generated by CqlStorage() would be
so
>>>> different.
>>>>
>>>> This is really causing me problems trying to access the column values.
>>>> I tried a naive approach of FLATTENing each tuple, then trying to access
>>>> the values that way:
>>>>
>>>> flattened = FOREACH data GENERATE
>>>>   FLATTEN(isbn),
>>>>   FLATTEN(booktitle),
>>>>   ...
>>>> values = FOREACH flattened GENERATE
>>>>   $1 AS ISBN,
>>>>   $3 AS BookTitle,
>>>>   ...
>>>>
>>>> As soon as I try to access field $5, Pig complains about the index
>>>> being out of bounds.
>>>>
>>>> Is there a way to solve the schema/reality mismatch? Am I doing
>>>> something wrong, or have I stumbled across a defect?
>>>>
>>>> Thanks,
>>>> Chad
>>>>
>>>
>>>
>>
>

Mime
View raw message