pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Russell Jurney <russell.jur...@gmail.com>
Subject Re: What should FLATTEN do?
Date Fri, 02 Apr 2010 19:52:14 GMT
Not sure if this is exactly the same, but when I've created tuples within
tuples in UDFs (to preserve order of pairs), from bag input, Pig has allowed
it - but I can't work with that data in subsequent steps.

On Fri, Apr 2, 2010 at 12:37 PM, hc busy <hc.busy@gmail.com> wrote:

> Yeah, I'm sure it has nested tuples. Pig doesn't natively support
> introduction of tuples
>
> h = foreach g generate ((x,y,z)), (x), ((((x))))
>
> doesn't work, but i have a udf that does that.... don't ask why...., and
> I've seen it print double pair of paren's when I took a dump.
>
> Our hadoop guys here says it's CDH2 and that the "upgrade" was just
> re-installation of CDH2... ("same jars") But certainly my script suddenly
> started doing weird things when it flattened that all the way through.
>
> I'd support the prior behavior as well, because that seems to match my
> reading of documentation on behavior of FLATTEN.
>
>
>
> Has anybody else had this problem with recent cloudera/pig versions?
>
>
> thnx!!
>
>
> On Fri, Apr 2, 2010 at 11:43 AM, zaki rahaman <zaki.rahaman@gmail.com
> >wrote:
>
> > Stupid question but are you sure your bag has the dual sets of
> parentheses?
> > (And if I may ask, why is that the case?)
> >
> > On Fri, Apr 2, 2010 at 2:11 PM, zaki rahaman <zaki.rahaman@gmail.com>
> > wrote:
> >
> > > If I'm not mistaken, the output is the expected behavior. Flatten
> should
> > > unnest bags. I'm assuming your statement is something like FOREACH ...
> > > GENERATE field1, field2, FLATTEN(bag1) which would 'duplicate' the
> first
> > two
> > > fields of a tuple for every tuple in the nested bag.
> > >
> > >
> > >
> > >
> > > On Fri, Apr 2, 2010 at 2:02 PM, hc busy <hc.busy@gmail.com> wrote:
> > >
> > >> doh!!!! s/map/bag/g
> > >>
> > >> I seem to get maps and bags mixed up or some reason...
> > >>
> > >> Guys, I have a row containing a *bag*
> > >>
> > >> 'id','data', {((1,2)), ((2,3)), ((4,5))}
> > >>
> > >> What is the expected behavior when I flatten on that bag? I had
> expected
> > >> it
> > >> to result in
> > >>
> > >> 'id','data', (1,2)
> > >> 'id','data', (2,3)
> > >> 'id','data', (4,5)
> > >>
> > >>
> > >> But it appears to me that the result of applying FLATTEN to that bag
> is
> > >> this
> > >> instead:
> > >>
> > >> 'id','data', 1,2
> > >> 'id','data', 2,3
> > >> 'id','data', 4,5
> > >>
> > >>
> > >> The latter is returned by the current cloudera's CDH2 and I've seen
> the
> > >> prior behavior on other versions of pig.
> > >>
> > >> Which is the correct behavior by design?
> > >>
> > >> What will pig 0.6 do when it is released?
> > >>
> > >> thanks!
> > >> On Fri, Apr 2, 2010 at 11:29 AM, hc busy <hc.busy@gmail.com> wrote:
> > >>
> > >> > Guys, I have a row containing a map
> > >> >
> > >> > 'id','data', {((1,2)), ((2,3)), ((4,5))}
> > >> >
> > >> > What is the expected behavior when I flatten on that bag? I had
> > expected
> > >> it
> > >> > to result in
> > >> >
> > >> > 'id','data', (1,2)
> > >> > 'id','data', (2,3)
> > >> > 'id','data', (4,5)
> > >> >
> > >> >
> > >> > But it appears to me that the result of applying FLATTEN to that bag
> > is
> > >> > this instead:
> > >> >
> > >> > 'id','data', 1,2
> > >> > 'id','data', 2,3
> > >> > 'id','data', 4,5
> > >> >
> > >> >
> > >> > The latter is returned by the current cloudera's CDH2 and I've seen
> > the
> > >> > prior behavior on other versions of pig.
> > >> >
> > >> > Which is the correct behavior by design?
> > >> >
> > >> > What will pig 0.6 do when it is released?
> > >> >
> > >> > thanks!
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > > Zaki Rahaman
> > >
> > >
> >
> >
> > --
> > Zaki Rahaman
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message