Return-Path: Delivered-To: apmail-hadoop-pig-dev-archive@www.apache.org Received: (qmail 46461 invoked from network); 2 Apr 2010 19:38:17 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 2 Apr 2010 19:38:17 -0000 Received: (qmail 26263 invoked by uid 500); 2 Apr 2010 19:38:16 -0000 Delivered-To: apmail-hadoop-pig-dev-archive@hadoop.apache.org Received: (qmail 26240 invoked by uid 500); 2 Apr 2010 19:38:16 -0000 Mailing-List: contact pig-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: pig-dev@hadoop.apache.org Delivered-To: mailing list pig-dev@hadoop.apache.org Received: (qmail 26226 invoked by uid 99); 2 Apr 2010 19:38:16 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Apr 2010 19:38:16 +0000 X-ASF-Spam-Status: No, hits=1.9 required=10.0 tests=AWL,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of hc.busy@gmail.com designates 209.85.221.187 as permitted sender) Received: from [209.85.221.187] (HELO mail-qy0-f187.google.com) (209.85.221.187) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Apr 2010 19:38:10 +0000 Received: by qyk17 with SMTP id 17so462338qyk.9 for ; Fri, 02 Apr 2010 12:37:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:received:message-id:subject:from:to:content-type; bh=KbYOf985unXmCpoybRGKQrgCDA241+/MYkpc1G8C4Dg=; b=wP6NO2qgifls2BOnVhc8IRu/r7YoXiJCUUfZqY0XR2CIDwQmADH5y1F/7Xa4Fhwe1+ Ronm+vGrtNwR5MJD5VO/dGW2HNP5TTvpiKMBDZp1iq45HTrEuiD7B0RPNGEisLknze8v JdCPM9a/gyDRDHO+nfzD9r6985XI+gD0VqanE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=QYSh4HlTqdMLM6SsXPg/qnYPy2qCRXXV4+nvr1BUV9jz5mEoLX1PinR9foEb8teUDT RibV0pSU3TvaIls3L1r0Ir7qTV6Dw2ey205bS2+YchFG5dK3GpTkr5Ft2ohgUyog4gdX fCXPU3PQl5IC1SW2Q1sLFxCW5oetiqk8mGmHs= MIME-Version: 1.0 Received: by 10.229.51.7 with HTTP; Fri, 2 Apr 2010 12:37:49 -0700 (PDT) In-Reply-To: References: Date: Fri, 2 Apr 2010 12:37:49 -0700 Received: by 10.229.35.80 with SMTP id o16mr4057272qcd.93.1270237069136; Fri, 02 Apr 2010 12:37:49 -0700 (PDT) Message-ID: Subject: Re: What should FLATTEN do? From: hc busy To: pig-user@hadoop.apache.org, pig-dev@hadoop.apache.org Content-Type: multipart/alternative; boundary=0016367b70fc4954ef0483461d1d --0016367b70fc4954ef0483461d1d Content-Type: text/plain; charset=ISO-8859-1 Yeah, I'm sure it has nested tuples. Pig doesn't natively support introduction of tuples h = foreach g generate ((x,y,z)), (x), ((((x)))) doesn't work, but i have a udf that does that.... don't ask why...., and I've seen it print double pair of paren's when I took a dump. Our hadoop guys here says it's CDH2 and that the "upgrade" was just re-installation of CDH2... ("same jars") But certainly my script suddenly started doing weird things when it flattened that all the way through. I'd support the prior behavior as well, because that seems to match my reading of documentation on behavior of FLATTEN. Has anybody else had this problem with recent cloudera/pig versions? thnx!! On Fri, Apr 2, 2010 at 11:43 AM, zaki rahaman wrote: > Stupid question but are you sure your bag has the dual sets of parentheses? > (And if I may ask, why is that the case?) > > On Fri, Apr 2, 2010 at 2:11 PM, zaki rahaman > wrote: > > > If I'm not mistaken, the output is the expected behavior. Flatten should > > unnest bags. I'm assuming your statement is something like FOREACH ... > > GENERATE field1, field2, FLATTEN(bag1) which would 'duplicate' the first > two > > fields of a tuple for every tuple in the nested bag. > > > > > > > > > > On Fri, Apr 2, 2010 at 2:02 PM, hc busy wrote: > > > >> doh!!!! s/map/bag/g > >> > >> I seem to get maps and bags mixed up or some reason... > >> > >> Guys, I have a row containing a *bag* > >> > >> 'id','data', {((1,2)), ((2,3)), ((4,5))} > >> > >> What is the expected behavior when I flatten on that bag? I had expected > >> it > >> to result in > >> > >> 'id','data', (1,2) > >> 'id','data', (2,3) > >> 'id','data', (4,5) > >> > >> > >> But it appears to me that the result of applying FLATTEN to that bag is > >> this > >> instead: > >> > >> 'id','data', 1,2 > >> 'id','data', 2,3 > >> 'id','data', 4,5 > >> > >> > >> The latter is returned by the current cloudera's CDH2 and I've seen the > >> prior behavior on other versions of pig. > >> > >> Which is the correct behavior by design? > >> > >> What will pig 0.6 do when it is released? > >> > >> thanks! > >> On Fri, Apr 2, 2010 at 11:29 AM, hc busy wrote: > >> > >> > Guys, I have a row containing a map > >> > > >> > 'id','data', {((1,2)), ((2,3)), ((4,5))} > >> > > >> > What is the expected behavior when I flatten on that bag? I had > expected > >> it > >> > to result in > >> > > >> > 'id','data', (1,2) > >> > 'id','data', (2,3) > >> > 'id','data', (4,5) > >> > > >> > > >> > But it appears to me that the result of applying FLATTEN to that bag > is > >> > this instead: > >> > > >> > 'id','data', 1,2 > >> > 'id','data', 2,3 > >> > 'id','data', 4,5 > >> > > >> > > >> > The latter is returned by the current cloudera's CDH2 and I've seen > the > >> > prior behavior on other versions of pig. > >> > > >> > Which is the correct behavior by design? > >> > > >> > What will pig 0.6 do when it is released? > >> > > >> > thanks! > >> > > >> > > > > > > > > -- > > Zaki Rahaman > > > > > > > -- > Zaki Rahaman > --0016367b70fc4954ef0483461d1d--