hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hc busy <hc.b...@gmail.com>
Subject Re: What should FLATTEN do?
Date Fri, 02 Apr 2010 19:37:49 GMT
Yeah, I'm sure it has nested tuples. Pig doesn't natively support
introduction of tuples

h = foreach g generate ((x,y,z)), (x), ((((x))))

doesn't work, but i have a udf that does that.... don't ask why...., and
I've seen it print double pair of paren's when I took a dump.

Our hadoop guys here says it's CDH2 and that the "upgrade" was just
re-installation of CDH2... ("same jars") But certainly my script suddenly
started doing weird things when it flattened that all the way through.

I'd support the prior behavior as well, because that seems to match my
reading of documentation on behavior of FLATTEN.



Has anybody else had this problem with recent cloudera/pig versions?


thnx!!


On Fri, Apr 2, 2010 at 11:43 AM, zaki rahaman <zaki.rahaman@gmail.com>wrote:

> Stupid question but are you sure your bag has the dual sets of parentheses?
> (And if I may ask, why is that the case?)
>
> On Fri, Apr 2, 2010 at 2:11 PM, zaki rahaman <zaki.rahaman@gmail.com>
> wrote:
>
> > If I'm not mistaken, the output is the expected behavior. Flatten should
> > unnest bags. I'm assuming your statement is something like FOREACH ...
> > GENERATE field1, field2, FLATTEN(bag1) which would 'duplicate' the first
> two
> > fields of a tuple for every tuple in the nested bag.
> >
> >
> >
> >
> > On Fri, Apr 2, 2010 at 2:02 PM, hc busy <hc.busy@gmail.com> wrote:
> >
> >> doh!!!! s/map/bag/g
> >>
> >> I seem to get maps and bags mixed up or some reason...
> >>
> >> Guys, I have a row containing a *bag*
> >>
> >> 'id','data', {((1,2)), ((2,3)), ((4,5))}
> >>
> >> What is the expected behavior when I flatten on that bag? I had expected
> >> it
> >> to result in
> >>
> >> 'id','data', (1,2)
> >> 'id','data', (2,3)
> >> 'id','data', (4,5)
> >>
> >>
> >> But it appears to me that the result of applying FLATTEN to that bag is
> >> this
> >> instead:
> >>
> >> 'id','data', 1,2
> >> 'id','data', 2,3
> >> 'id','data', 4,5
> >>
> >>
> >> The latter is returned by the current cloudera's CDH2 and I've seen the
> >> prior behavior on other versions of pig.
> >>
> >> Which is the correct behavior by design?
> >>
> >> What will pig 0.6 do when it is released?
> >>
> >> thanks!
> >> On Fri, Apr 2, 2010 at 11:29 AM, hc busy <hc.busy@gmail.com> wrote:
> >>
> >> > Guys, I have a row containing a map
> >> >
> >> > 'id','data', {((1,2)), ((2,3)), ((4,5))}
> >> >
> >> > What is the expected behavior when I flatten on that bag? I had
> expected
> >> it
> >> > to result in
> >> >
> >> > 'id','data', (1,2)
> >> > 'id','data', (2,3)
> >> > 'id','data', (4,5)
> >> >
> >> >
> >> > But it appears to me that the result of applying FLATTEN to that bag
> is
> >> > this instead:
> >> >
> >> > 'id','data', 1,2
> >> > 'id','data', 2,3
> >> > 'id','data', 4,5
> >> >
> >> >
> >> > The latter is returned by the current cloudera's CDH2 and I've seen
> the
> >> > prior behavior on other versions of pig.
> >> >
> >> > Which is the correct behavior by design?
> >> >
> >> > What will pig 0.6 do when it is released?
> >> >
> >> > thanks!
> >> >
> >>
> >
> >
> >
> > --
> > Zaki Rahaman
> >
> >
>
>
> --
> Zaki Rahaman
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message