Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 13ADD4C96 for ; Wed, 15 Jun 2011 18:17:55 +0000 (UTC) Received: (qmail 16726 invoked by uid 500); 15 Jun 2011 18:17:52 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 16700 invoked by uid 500); 15 Jun 2011 18:17:52 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 16692 invoked by uid 99); 15 Jun 2011 18:17:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Jun 2011 18:17:52 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: 209.85.214.44 is neither permitted nor denied by domain of oberman@civicscience.com) Received: from [209.85.214.44] (HELO mail-bw0-f44.google.com) (209.85.214.44) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Jun 2011 18:17:44 +0000 Received: by bwz13 with SMTP id 13so823562bwz.31 for ; Wed, 15 Jun 2011 11:17:22 -0700 (PDT) Received: by 10.204.179.76 with SMTP id bp12mr13794bkb.160.1308161842558; Wed, 15 Jun 2011 11:17:22 -0700 (PDT) MIME-Version: 1.0 Received: by 10.204.51.12 with HTTP; Wed, 15 Jun 2011 11:17:02 -0700 (PDT) X-Originating-IP: [24.23.118.38] From: William Oberman Date: Wed, 15 Jun 2011 14:17:02 -0400 Message-ID: Subject: prep for cassandra storage from pig To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0016e6dbe6b7ef379a04a5c42998 --0016e6dbe6b7ef379a04a5c42998 Content-Type: text/plain; charset=ISO-8859-1 I think I'm stuck on typing issues trying to store data in cassandra. To verify, cassandra wants (key, {tuples}) My pig script is fairly brief: raw = LOAD 'cassandra://test_in/test_cf' USING CassandraStorage() AS (key:chararray, columns:bag {column:tuple (name, value)}); --colums == timeUUID -> JSON rows = FOREACH raw GENERATE key, FLATTEN(columns); alias_target_day = FOREACH rows { --I wrote a specialized parser that does exactly what I need observation_map = com.civicscience.pig.ParseObservation($2); GENERATE $0 as alias, observation_map#'_fqt' as target, observation_map#'_day' as day; }; grouping = GROUP alias_target_day BY ((chararray)target,(chararray)day); X = FOREACH grouping GENERATE group.$0 as target, TOTUPLE(group.$1, COUNT($1)) as day_count; This gets me: (targetA, (day1, count)) (targetA, (day2, count)) (targetB, (day1, count)) .... But, cassandra wants the 2nd item to be a bag. So, I tried: X = FOREACH grouping GENERATE group.$0 as target, TOBAG(TOTUPLE(group.$1, COUNT($1))) as day_count; But this results in: (targetA, {((day1, count))}) (targetA, {((day2, count))}) (targetB, {((day1, count))}) It's hard to see, but the 2nd item now has a nested tuple as the first value, which is still bad. How to I get (key, {tuple})??? I wasn't sure where to post this (pig or cassandra), so I'm posting to the pig list too. will --0016e6dbe6b7ef379a04a5c42998 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I think I'm stuck on typing issues trying to store data in cassandra.= =A0 To verify, cassandra wants (key, {tuples})

My pig script is fair= ly brief:
raw =3D LOAD 'cassandra://test_in/test_cf' USING Cassa= ndraStorage() AS (key:chararray, columns:bag {column:tuple (name, value)});=
--colums =3D=3D timeUUID -> JSON
rows =3D FOREACH raw GENERATE key, FLATTEN(columns);
alias_target_day = =3D FOREACH rows {
=A0=A0=A0 --I wrote a specialized parser that does ex= actly what I need
=A0=A0=A0 observation_map =3D com.civicscience.pig.ParseObservation($2); =A0=A0=A0 GENERATE $0 as alias, observation_map#'_fqt' as target, o= bservation_map#'_day' as day;
};
grouping =3D GROUP alias_target_day BY ((chararray)target,(chararray)day);<= br> X =3D FOREACH grouping GENERATE group.$0 as target, TOTUPLE(group.$1, COUNT= ($1)) as day_count;

This gets me:
(targetA, (day1, count))
(ta= rgetA, (day2, count))
(targetB, (day1, count))
....

But, cassa= ndra wants the 2nd item to be a bag.=A0 So, I tried:
X =3D FOREACH grouping GENERATE group.$0 as target, TOBAG(TOTUPLE(group.$1,= COUNT($1))) as day_count;

But this results in:
(ta= rgetA, {((day1, count))})
(targetA, {((day2, count))})
(targetB, {((day1, count))})
It's hard to see, but the 2nd item now = has a nested tuple as the first value, which is still bad.

How to I = get (key, {tuple})???=A0 I wasn't sure where to post this (pig or cassa= ndra), so I'm posting to the pig list too.

will
--0016e6dbe6b7ef379a04a5c42998--