Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8929F10C64 for ; Mon, 26 Aug 2013 08:33:11 +0000 (UTC) Received: (qmail 75363 invoked by uid 500); 26 Aug 2013 08:33:08 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 74891 invoked by uid 500); 26 Aug 2013 08:33:03 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 74728 invoked by uid 99); 26 Aug 2013 08:33:01 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Aug 2013 08:33:01 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of mianmarjun.mailinglist@gmail.com designates 209.85.217.170 as permitted sender) Received: from [209.85.217.170] (HELO mail-lb0-f170.google.com) (209.85.217.170) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Aug 2013 08:32:55 +0000 Received: by mail-lb0-f170.google.com with SMTP id r12so1256094lbi.1 for ; Mon, 26 Aug 2013 01:32:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=gGjv/O5pGLD/W/I5cIr1ANX0K3BOQsu3cc6CKotmmYw=; b=iLJsW8qEKojCEnMIkVeiWdKF4Sx7zBU+/c3zryDkddhMbHWPxvVOlKyogmxrtEuQXs vYj6tuOv5PjpQ2JI3xrHZGUZLzJydsq2//LH4EmLf5oIaMNGI+yozXeH6NLwCq+P8HPn +Dkl+PBUrMvhZ5VIzTu4eWx8kcdTlSLzrN/cAOvsSuDX7VjXq40zMdifHcc5r0CHjjLm Opdj+JgRoo+V3mN64f0vNKZTuCult16VqJTsrRLU/ayeGIEomP6C8ElBM6Nut2VyZl2a A3wp+FxPrroh7Fgzq70wt/TlJm2kmd8hgPcxGSidgYzUV+rOKR1lwzVkOoMs2+alf3J3 vsqQ== MIME-Version: 1.0 X-Received: by 10.152.29.201 with SMTP id m9mr12966275lah.6.1377505953370; Mon, 26 Aug 2013 01:32:33 -0700 (PDT) Received: by 10.114.232.12 with HTTP; Mon, 26 Aug 2013 01:32:33 -0700 (PDT) In-Reply-To: References: Date: Mon, 26 Aug 2013 10:32:33 +0200 Message-ID: Subject: Re: CqlStorage creates wrong schema for Pig From: Miguel Angel Martin junquera To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=089e0158cb9206d58f04e4d5995d X-Virus-Checked: Checked by ClamAV on apache.org --089e0158cb9206d58f04e4d5995d Content-Type: text/plain; charset=ISO-8859-1 hi Chad . I have this issue I send a mail to user-pig-list and I still i can resolve this, and I can not access to column values. In this mail I write some things that I try without results... and information about this issue. http://mail-archives.apache.org/mod_mbox/pig-user/201308.mbox/%3CCAJeG_hQ9S2Po3_XytZX5Xki4J1maO8q26jYdG2Wndy_KYiv9CQ@mail.gmail.com%3E I hope someOne reply one comment, idea or solution about this issue or bug. I have reviewed the CqlStorage class in code cassandra 1.2.8 but i do not have configure the environmetn to debug and trace this issue. Only I find some comments like, but I do not understand at all. /** * A LoadStoreFunc for retrieving data from and storing data to Cassandra * * A row from a standard CF will be returned as nested tuples: * (((key1, value1), (key2, value2)), ((name1, val1), (name2, val2))). */ I you found some idea or solution, please post it thanks 2013/8/23 Chad Johnston > (I'm using Cassandra 1.2.8 and Pig 0.11.1) > > I'm loading some simple data from Cassandra into Pig using CqlStorage. The > CqlStorage loader defines a Pig schema based on the Cassandra schema, but > it seems to be wrong. > > If I do: > > data = LOAD 'cql://bookdata/books' USING CqlStorage(); > DESCRIBE data; > > I get this: > > data: {isbn: chararray,bookauthor: chararray,booktitle: > chararray,publisher: chararray,yearofpublication: int} > > However, if I DUMP data, I get results like these: > > ((isbn,0425093387),(bookauthor,Georgette Heyer),(booktitle,Death in the > Stocks),(publisher,Berkley Pub Group),(yearofpublication,1986)) > > Clearly the results from Cassandra are key/value pairs, as would be > expected. I don't know why the schema generated by CqlStorage() would be so > different. > > This is really causing me problems trying to access the column values. I > tried a naive approach of FLATTENing each tuple, then trying to access the > values that way: > > flattened = FOREACH data GENERATE > FLATTEN(isbn), > FLATTEN(booktitle), > ... > values = FOREACH flattened GENERATE > $1 AS ISBN, > $3 AS BookTitle, > ... > > As soon as I try to access field $5, Pig complains about the index being > out of bounds. > > Is there a way to solve the schema/reality mismatch? Am I doing something > wrong, or have I stumbled across a defect? > > Thanks, > Chad > --089e0158cb9206d58f04e4d5995d Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
hi Chad .

I have this issue
<= br>
I send a mail to user-pig-list and =A0I still i can resolve t= his, and I can not =A0access to column values.
In this mail =A0I = write some things that I try without results... and information about this = issue.





I hope =A0someOne r= eply =A0one comment, idea or =A0solution about =A0this issue or bug.
<= div>

I have reviewed the CqlStorage class in c= ode cassandra 1.2.8 =A0but i do not have configure the environmetn to debug= =A0and trace this issue.

Only =A0I find some comments like, but I do not underst= and at all.=A0


/**

=A0* A LoadStoreFunc for retrieving data from and storing dat= a to Cassandra

=A0*

=A0* A row from a standard CF will be returned as nested tupl= es:=A0

=A0* (((key1, value1), (key2, value2)), ((name1, val1), (name= 2, val2))).

=A0*/


I you found some idea or= solution, please post it

thanks



=A0





2013/8/23 Chad Johnston <cjohnston@megatome.com>
=
(I'm using Cassandra 1.2.8 and Pig 0.11.1)

I'm loading some simple data from Cassandra into= Pig using CqlStorage. The CqlStorage loader defines a Pig schema based on = the Cassandra schema, but it seems to be wrong.

If I do:
=A0 =A0=A0
data =3D LOAD &= #39;cql://bookdata/books' USING CqlStorage();
DESCRIBE data;<= /div>

I get this:

data: {isbn: = chararray,bookauthor: chararray,booktitle: chararray,publisher: chararray,y= earofpublication: int}

However, if I DUMP data, I get results like these:

((isbn,0425093387),(bookauthor,Georgette Heyer),(bookt= itle,Death in the Stocks),(publisher,Berkley Pub Group),(yearofpublication,= 1986))

Clearly the results from Cassandra are key/value pairs,= as would be expected. I don't know why the schema generated by CqlStor= age() would be so different.

This is really causin= g me problems trying to access the column values. I tried a naive approach = of FLATTENing each tuple, then trying to access the values that way:

flattened =3D FOREACH data GENERATE
=A0 FLATT= EN(isbn),
=A0 FLATTEN(booktitle),
=A0 ...
val= ues =3D FOREACH flattened GENERATE
=A0 $1 AS ISBN,
=A0 = $3 AS BookTitle,
=A0 ...

As soon as I try to access field $5, = Pig complains about the index being out of bounds.=A0

<= div>Is there a way to solve the schema/reality mismatch? Am I doing somethi= ng wrong, or have I stumbled across a defect?

Thanks,
Chad

--089e0158cb9206d58f04e4d5995d--