Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 226AF100A7 for ; Mon, 2 Sep 2013 13:09:46 +0000 (UTC) Received: (qmail 58636 invoked by uid 500); 2 Sep 2013 13:09:42 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 58610 invoked by uid 500); 2 Sep 2013 13:09:40 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 58600 invoked by uid 99); 2 Sep 2013 13:09:37 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Sep 2013 13:09:37 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of mianmarjun.mailinglist@gmail.com designates 209.85.217.180 as permitted sender) Received: from [209.85.217.180] (HELO mail-lb0-f180.google.com) (209.85.217.180) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Sep 2013 13:09:31 +0000 Received: by mail-lb0-f180.google.com with SMTP id q8so3867083lbi.39 for ; Mon, 02 Sep 2013 06:09:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=hcVJItuBClWGi+AZvJQW9oPSjePjlvqN3FeOhLBnTtE=; b=gOSfm4iyOc6+4DHHNYbA4ypGX27i+FQ9mO8zNePtfvFwpL6MubvktOV8Q0wVTJYXKb cC9ZPt3CmzZFSBA/mXIUTp/pxAoODgdlXGkm9hJhsNgefXYeJRki+5zFfpM9dkHIm21W K/Sm3SyQCQuUi8M+IEDbxmIXfdosjFhLuMNg8ctJL1K1UiVsYwBFnKq2eTXY+D0KNjNO 3oFUhBZckMpC2EQY/oe7BQFMzxs/PqzJosLAmwv4xtIxJoYuU1vMrU8FqZb1ketAUl6m BHK6z7wrY4fAvB6psphpn2MTWzGWw/WUy/nWsP5hl2eZNxBNmBCrwkJoI19VWE1ZssJw 3QCA== MIME-Version: 1.0 X-Received: by 10.152.8.51 with SMTP id o19mr1376376laa.42.1378127350928; Mon, 02 Sep 2013 06:09:10 -0700 (PDT) Received: by 10.114.232.12 with HTTP; Mon, 2 Sep 2013 06:09:10 -0700 (PDT) In-Reply-To: References: Date: Mon, 2 Sep 2013 15:09:10 +0200 Message-ID: Subject: Re: CqlStorage creates wrong schema for Pig From: Miguel Angel Martin junquera To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001a11c352aa3517ea04e566474f X-Virus-Checked: Checked by ClamAV on apache.org --001a11c352aa3517ea04e566474f Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable hi all: More info : https://issues.apache.org/jira/browse/CASSANDRA-5941 I tried this (and gen. cassandra 1.2.9) but do not work for me, git clone http://git-wip-us.apache.org/repos/asf/cassandra.git cd cassandra git checkout cassandra-1.2 patch -p1 < 5867-bug-fix-filter-push-down-1.2-branch.txt ant Miguel Angel Mart=EDn Junquera Analyst Engineer. miguelangel.martin@brainsins.com 2013/9/2 Miguel Angel Martin junquera > *good/nice job !!!* > * > * > * > * > *I'd testing with an udf only with string schema type this is better > and elaborate work..* > * > * > *Regads* > > > Miguel Angel Mart=EDn Junquera > Analyst Engineer. > miguelangel.martin@brainsins.com > > > > 2013/8/31 Chad Johnston > >> I threw together a quick UDF to work around this issue. It just extracts >> the value portion of the tuple while taking advantage of the CqlStorage >> generated schema to keep the type correct. >> >> You can get it here: https://github.com/iamthechad/cqlstorage-udf >> >> I'll see if I can find more useful information and open a defect, since >> that's what this seems to be. >> >> Chad >> >> >> On Fri, Aug 30, 2013 at 2:02 AM, Miguel Angel Martin junquera < >> mianmarjun.mailinglist@gmail.com> wrote: >> >>> I try this: >>> >>> *rows =3D LOAD >>> 'cql://keyspace1/test?page_size=3D1&split_size=3D4&where_clause=3Dage%3= D30' USING >>> CqlStorage();* >>> >>> *dump rows;* >>> >>> *ILLUSTRATE rows;* >>> >>> *describe rows;* >>> >>> * >>> * >>> >>> *values2=3D FOREACH rows GENERATE TOTUPLE (id) as >>> (mycolumn:tuple(name,value));* >>> >>> *dump values2;* >>> >>> *describe values2;* >>> * >>> * >>> >>> But I get this results: >>> >>> >>> >>> ------------------------------------------------------------- >>> | rows | id:chararray | age:int | title:chararray | >>> ------------------------------------------------------------- >>> | | (id, 6) | (age, 30) | (title, QA) | >>> ------------------------------------------------------------- >>> >>> rows: {id: chararray,age: int,title: chararray} >>> 2013-08-30 09:54:37,831 [main] ERROR org.apache.pig.tools.grunt.Grunt - >>> ERROR 1031: Incompatable field schema: left is >>> "tuple_0:tuple(mycolumn:tuple(name:bytearray,value:bytearray))", right = is >>> "org.apache.pig.builtin.totuple_id_1:tuple(id:chararray)" >>> >>> >>> >>> >>> >>> or >>> >>> >>> >>> .... >>> >>> *values2=3D FOREACH rows GENERATE TOTUPLE (id) ;* >>> *dump values2;* >>> *describe values2;* >>> >>> >>> >>> >>> and the results are: >>> >>> >>> ... >>> (((id,6))) >>> (((id,5))) >>> values2: {org.apache.pig.builtin.totuple_id_8: (id: chararray)} >>> >>> >>> >>> Aggg!!!!! >>> >>> >>> * >>> * >>> >>> >>> >>> Miguel Angel Mart=EDn Junquera >>> Analyst Engineer. >>> miguelangel.martin@brainsins.com >>> >>> >>> >>> 2013/8/26 Miguel Angel Martin junquera >> > >>> >>>> hi Chad . >>>> >>>> I have this issue >>>> >>>> I send a mail to user-pig-list and I still i can resolve this, and I >>>> can not access to column values. >>>> In this mail I write some things that I try without results... and >>>> information about this issue. >>>> >>>> >>>> >>>> http://mail-archives.apache.org/mod_mbox/pig-user/201308.mbox/%3CCAJeG= _hQ9S2Po3_XytZX5Xki4J1maO8q26jYdG2Wndy_KYiv9CQ@mail.gmail.com%3E >>>> >>>> >>>> >>>> I hope someOne reply one comment, idea or solution about this issu= e >>>> or bug. >>>> >>>> >>>> I have reviewed the CqlStorage class in code cassandra 1.2.8 but i do >>>> not have configure the environmetn to debug and trace this issue. >>>> >>>> Only I find some comments like, but I do not understand at all. >>>> >>>> >>>> /** >>>> >>>> * A LoadStoreFunc for retrieving data from and storing data to >>>> Cassandra >>>> >>>> * >>>> >>>> * A row from a standard CF will be returned as nested tuples: >>>> >>>> * (((key1, value1), (key2, value2)), ((name1, val1), (name2, val2))). >>>> */ >>>> >>>> >>>> I you found some idea or solution, please post it >>>> >>>> thanks >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> 2013/8/23 Chad Johnston >>>> >>>>> (I'm using Cassandra 1.2.8 and Pig 0.11.1) >>>>> >>>>> I'm loading some simple data from Cassandra into Pig using CqlStorage= . >>>>> The CqlStorage loader defines a Pig schema based on the Cassandra sch= ema, >>>>> but it seems to be wrong. >>>>> >>>>> If I do: >>>>> >>>>> data =3D LOAD 'cql://bookdata/books' USING CqlStorage(); >>>>> DESCRIBE data; >>>>> >>>>> I get this: >>>>> >>>>> data: {isbn: chararray,bookauthor: chararray,booktitle: >>>>> chararray,publisher: chararray,yearofpublication: int} >>>>> >>>>> However, if I DUMP data, I get results like these: >>>>> >>>>> ((isbn,0425093387),(bookauthor,Georgette Heyer),(booktitle,Death in >>>>> the Stocks),(publisher,Berkley Pub Group),(yearofpublication,1986)) >>>>> >>>>> Clearly the results from Cassandra are key/value pairs, as would be >>>>> expected. I don't know why the schema generated by CqlStorage() would= be so >>>>> different. >>>>> >>>>> This is really causing me problems trying to access the column values= . >>>>> I tried a naive approach of FLATTENing each tuple, then trying to acc= ess >>>>> the values that way: >>>>> >>>>> flattened =3D FOREACH data GENERATE >>>>> FLATTEN(isbn), >>>>> FLATTEN(booktitle), >>>>> ... >>>>> values =3D FOREACH flattened GENERATE >>>>> $1 AS ISBN, >>>>> $3 AS BookTitle, >>>>> ... >>>>> >>>>> As soon as I try to access field $5, Pig complains about the index >>>>> being out of bounds. >>>>> >>>>> Is there a way to solve the schema/reality mismatch? Am I doing >>>>> something wrong, or have I stumbled across a defect? >>>>> >>>>> Thanks, >>>>> Chad >>>>> >>>> >>>> >>> >> > --001a11c352aa3517ea04e566474f Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
hi all:

More info :

http= s://issues.apache.org/jira/browse/CASSANDRA-5941



I tried this (and gen. cassandra 1.2.9) =A0but do n= ot work for me,=A0

git clone http://git-wip-us.apache.org/repos/asf/ca=
ssandra.git
cd cassandra
git checkout cassandra-1.2
patch -p1 < 5867-bug-fix-filter-push-down-1.2-branch.txt
ant

Miguel An= gel Mart=EDn Junquera
Analyst Engineer.



2013/9/2 Miguel Angel Martin junquera <mianmarjun.mailinglist@gmail.com>
good/nice job !!!


I'd testing with an udf only with =A0string schema type =A0th= is is better and elaborate work..

= Regads

<= div dir=3D"ltr">

Miguel An= gel Mart=EDn Junquera
Analyst Engineer.



2013/8/31 C= had Johnston <cjohnston@megatome.com>
I threw together a quick UDF to work around this issue. It= just extracts the value portion of the tuple while taking advantage of the= CqlStorage generated schema to keep the type correct.

You can get it here:=A0https://github.com/iamthechad/cqlstorage-udf

I'll see if I can find more useful information an= d open a defect, since that's what this seems to be.

Chad


On Fri, Aug 30, 2013 at 2:02= AM, Miguel Angel Martin junquera <mianmarjun.mailinglist@g= mail.com> wrote:
I try this:

<= blockquote style=3D"margin:0px 0px 0px 40px;border:none;padding:0px"> rows =3D = LOAD 'cql://keyspace1/test?page_size=3D1&split_size=3D4&where_c= lause=3Dage%3D30' USING CqlStorage();
dump rows;
ILLUSTRATE rows;
describe rows;

values2= =3D FOREACH rows GENERATE =A0TOTUPLE (id) as (mycolumn:tuple(name,value));<= /b>
dump values2;
describe values2;

But I get this results:


----------------------= ---------------------------------------
| rows =A0 =A0 | id:chara= rray =A0 | age:int =A0 | title:chararray =A0 |=A0
-------------------------------------------------------------
| =A0 =A0 =A0 =A0 =A0| (id, 6) =A0 =A0 =A0 =A0| (age, 30) | (title, QA) = =A0 =A0 =A0 |=A0
------------------------------------------------= -------------

rows: {id: chararray,age: int,title: chararray}
20= 13-08-30 09:54:37,831 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR= 1031: Incompatable field schema: left is "tuple_0:tuple(mycolumn:tupl= e(name:bytearray,value:bytearray))", right is "org.apache.pig.bui= ltin.totuple_id_1:tuple(id:chararray)"





=
or=A0



....

values2=3D FOREACH rows GENERATE =A0TOTUPLE (id) ;
dump values2;
describe values2;



and =A0the results are:=


...
(((id,6)))
= (((id,5)))
values2: {org.apache.pig.builtin.totuple_id_8: (id: ch= ararray)}



Aggg!!!!!





Miguel An= gel Mart=EDn Junquera
Analyst Engineer.



2013/8/26 Miguel Angel Martin junquera <= span dir=3D"ltr"><mianmarjun.mailinglist@gmail.com>
hi Chad .

I have this issue
<= br>
I send a mail to user-pig-list and =A0I still i can resolve t= his, and I can not =A0access to column values.
In this mail =A0I = write some things that I try without results... and information about this = issue.





I hope =A0someOne r= eply =A0one comment, idea or =A0solution about =A0this issue or bug.
<= div>

I have reviewed the CqlStorage class in c= ode cassandra 1.2.8 =A0but i do not have configure the environmetn to debug= =A0and trace this issue.

Only =A0I find some comments like, but I do not underst= and at all.=A0


/**

=A0* A LoadStoreFunc for retrieving data from and storing data to = Cassandra

=A0*

=A0* A row from a standard CF will be returned as nested tuples:=A0

=A0* (((key1, value1), (key2, value2)), ((name1, val1), (name2, val2))).=

=A0*/


I you found some idea or= solution, please post it

thanks



=A0





2013/8/23 Chad Johnston <cjohnston@megatome.com><= /span>
(I'm using Cassandra 1.2.8 and Pig 0.11.1)

I'm loading some simple data from Cassandra into= Pig using CqlStorage. The CqlStorage loader defines a Pig schema based on = the Cassandra schema, but it seems to be wrong.

If I do:
=A0 =A0=A0
data =3D LOAD &= #39;cql://bookdata/books' USING CqlStorage();
DESCRIBE data;<= /div>

I get this:

data: {isbn: = chararray,bookauthor: chararray,booktitle: chararray,publisher: chararray,y= earofpublication: int}

However, if I DUMP data, I get results like these:

((isbn,0425093387),(bookauthor,Georgette Heyer),(bookt= itle,Death in the Stocks),(publisher,Berkley Pub Group),(yearofpublication,= 1986))

Clearly the results from Cassandra are key/value pairs,= as would be expected. I don't know why the schema generated by CqlStor= age() would be so different.

This is really causin= g me problems trying to access the column values. I tried a naive approach = of FLATTENing each tuple, then trying to access the values that way:

flattened =3D FOREACH data GENERATE
=A0 FLATT= EN(isbn),
=A0 FLATTEN(booktitle),
=A0 ...
val= ues =3D FOREACH flattened GENERATE
=A0 $1 AS ISBN,
=A0 = $3 AS BookTitle,
=A0 ...

As soon as I try to access field $5, = Pig complains about the index being out of bounds.=A0

<= div>Is there a way to solve the schema/reality mismatch? Am I doing somethi= ng wrong, or have I stumbled across a defect?

Thanks,
Chad





--001a11c352aa3517ea04e566474f--