hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "pi song" <pi.so...@gmail.com>
Subject Re: Confused about COGroup semantic
Date Fri, 16 May 2008 12:02:04 GMT
I don't like it either but seems like we just have to live with this 2
semantics 1 operator thing.

Does anyone have a better solution?

Goals:-
- Consistent semantic
- Ease of use


On Fri, May 16, 2008 at 2:30 PM, Utkarsh Srivastava <utkarsh@yahoo-inc.com>
wrote:

> >
> > The first output column will have to be wrapped in Tuple whereas if we
> > group
> > by only one column we don't have to wrap. Is that the right logic?
> >
>
> Yes, that is how it has worked so far.
>
> However, if I am not a big fan of this logic since it is confusing at
> times and leads to several special cases in the code. I think it will be
> cleaner to always wrap in a tuple. But that has 2 disadvantages:
>
> i) Will break backward compatibility
> ii) Will lead to non-flat tuples which users won't be able to store
> using default storage functions.
>
> Utkarsh
>
>
>
> > Pi
> >
> > On 5/16/08, Alan Gates <gates@yahoo-inc.com> wrote:
> > >
> > > There really isn't any meaning to cogrouping with one field on one
> > relation
> > > and two fields on another.  Given our definition of tuple, there
> will
> > never
> > > be any tuples that match.  I believe Santhosh has changed this to be
> a
> > > syntax error.
> > >
> > > Alan.
> > >
> > > pi song wrote:
> > >
> > >> Normally we do COGroup like this:-
> > >>
> > >> X = COGroup A By $0, B By $0 ;
> > >>
> > >> This first column of the output will be data atom.
> > >>
> > >> But if we do:-
> > >>
> > >> X = COGroup A By $0, B By $0, $1 ;
> > >>
> > >> What is the the first column then? I assume the B grouping will be
> > wrapped
> > >> to tuple and treated as atom. Am I right?
> > >>
> > >> Pi
> > >>
> > >>
> > >>
> > >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message