hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Utkarsh Srivastava" <utka...@yahoo-inc.com>
Subject RE: Confused about COGroup semantic
Date Fri, 16 May 2008 04:30:53 GMT
> 
> The first output column will have to be wrapped in Tuple whereas if we
> group
> by only one column we don't have to wrap. Is that the right logic?
> 

Yes, that is how it has worked so far. 

However, if I am not a big fan of this logic since it is confusing at
times and leads to several special cases in the code. I think it will be
cleaner to always wrap in a tuple. But that has 2 disadvantages:

i) Will break backward compatibility
ii) Will lead to non-flat tuples which users won't be able to store
using default storage functions.

Utkarsh



> Pi
> 
> On 5/16/08, Alan Gates <gates@yahoo-inc.com> wrote:
> >
> > There really isn't any meaning to cogrouping with one field on one
> relation
> > and two fields on another.  Given our definition of tuple, there
will
> never
> > be any tuples that match.  I believe Santhosh has changed this to be
a
> > syntax error.
> >
> > Alan.
> >
> > pi song wrote:
> >
> >> Normally we do COGroup like this:-
> >>
> >> X = COGroup A By $0, B By $0 ;
> >>
> >> This first column of the output will be data atom.
> >>
> >> But if we do:-
> >>
> >> X = COGroup A By $0, B By $0, $1 ;
> >>
> >> What is the the first column then? I assume the B grouping will be
> wrapped
> >> to tuple and treated as atom. Am I right?
> >>
> >> Pi
> >>
> >>
> >>
> >

Mime
View raw message