Return-Path: X-Original-To: apmail-pig-user-archive@www.apache.org Delivered-To: apmail-pig-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 188E7489D for ; Wed, 15 Jun 2011 19:04:38 +0000 (UTC) Received: (qmail 46148 invoked by uid 500); 15 Jun 2011 19:04:36 -0000 Delivered-To: apmail-pig-user-archive@pig.apache.org Received: (qmail 46108 invoked by uid 500); 15 Jun 2011 19:04:36 -0000 Mailing-List: contact user-help@pig.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@pig.apache.org Delivered-To: mailing list user@pig.apache.org Received: (qmail 46076 invoked by uid 99); 15 Jun 2011 19:04:36 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Jun 2011 19:04:36 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jeremy.hanna1234@gmail.com designates 209.85.160.177 as permitted sender) Received: from [209.85.160.177] (HELO mail-gy0-f177.google.com) (209.85.160.177) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Jun 2011 19:04:27 +0000 Received: by gyh20 with SMTP id 20so5203gyh.22 for ; Wed, 15 Jun 2011 12:04:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:content-type:mime-version:subject:from :in-reply-to:date:content-transfer-encoding:message-id:references:to :x-mailer; bh=RxHwtu9KVEtZO6VTjJMVzzpmYfYN1qmyvLQCSTEqfjk=; b=xnIcpQ1zht5faZkkmNAv83tpDs/f/oNpXQ0qFbqfrlsFEqk3q+hLj7Bx8WJy4pSeAR 0bkY0JRqQSIHU4XGIUjhZ48qWOItbZfiqvL0KsUNf47iMWg7AGGALijlJdsUY7/QvEEh CbwC4kjkPNt7XJK41Uod9CyxTNLHDB/Lg+ve4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer; b=wZbpos6mJIpTa2mJ7jUFTKoOHmSghxZrONsq4bT0No75n3p+XFod8B6bTO/WUGvl3P nB3em/jhlsoFuMNDQIhYvmqA2FDPBGuG4cswB2K1noKIIKj6PCknJgEPv4ZgAG0FTPYx C5tUwgI5CO5mslgMob8jfQRPlfHzv4xBWIb0g= Received: by 10.91.159.29 with SMTP id l29mr19704ago.107.1308164646237; Wed, 15 Jun 2011 12:04:06 -0700 (PDT) Received: from [172.31.132.100] ([64.27.47.116]) by mx.google.com with ESMTPS id 10sm676401anw.23.2011.06.15.12.04.03 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 15 Jun 2011 12:04:04 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1084) Subject: Re: prep for cassandra storage from pig From: Jeremy Hanna In-Reply-To: Date: Wed, 15 Jun 2011 14:04:02 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: <7F9E700F-C655-4639-A830-0DF405EC2CDB@gmail.com> References: To: user@cassandra.apache.org, user@pig.apache.org X-Mailer: Apple Mail (2.1084) X-Virus-Checked: Checked by ClamAV on apache.org Hi Will, That's partly why I like to use FromCassandraBag and ToCassandraBag from = pygmalion - it does the work for you to get it back into a form that = cassandra understands. Others may know better how to massage the data into that form using just = pig, but if all else fails, you could write a udf to do that. Jeremy On Jun 15, 2011, at 1:17 PM, William Oberman wrote: > I think I'm stuck on typing issues trying to store data in cassandra. = To verify, cassandra wants (key, {tuples}) >=20 > My pig script is fairly brief: > raw =3D LOAD 'cassandra://test_in/test_cf' USING CassandraStorage() AS = (key:chararray, columns:bag {column:tuple (name, value)}); > --colums =3D=3D timeUUID -> JSON > rows =3D FOREACH raw GENERATE key, FLATTEN(columns); > alias_target_day =3D FOREACH rows { > --I wrote a specialized parser that does exactly what I need > observation_map =3D com.civicscience.pig.ParseObservation($2); > GENERATE $0 as alias, observation_map#'_fqt' as target, = observation_map#'_day' as day; > }; > grouping =3D GROUP alias_target_day BY = ((chararray)target,(chararray)day); > X =3D FOREACH grouping GENERATE group.$0 as target, TOTUPLE(group.$1, = COUNT($1)) as day_count; >=20 > This gets me: > (targetA, (day1, count)) > (targetA, (day2, count)) > (targetB, (day1, count)) > .... >=20 > But, cassandra wants the 2nd item to be a bag. So, I tried: > X =3D FOREACH grouping GENERATE group.$0 as target, = TOBAG(TOTUPLE(group.$1, COUNT($1))) as day_count; >=20 > But this results in: > (targetA, {((day1, count))}) > (targetA, {((day2, count))}) > (targetB, {((day1, count))}) > It's hard to see, but the 2nd item now has a nested tuple as the first = value, which is still bad. >=20 > How to I get (key, {tuple})??? I wasn't sure where to post this (pig = or cassandra), so I'm posting to the pig list too. >=20 > will