pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Santhosh Srinivasan <...@yahoo-inc.com>
Subject RE: Easy question...difference between this::form and this.form?
Date Tue, 07 Dec 2010 18:11:15 GMT
> The sql way to deal with this issue is essentially to keep the name of the parent relation

> around during parsing, and require that you explicitly provide the desired parent if
column 
> names are ambiguous. That's probably something that could be implemented now that we
have  
> the required metadata in the operators (I believe it wasn't there when the disambiguation

> design was implemented). 

Isn't that true today? Unambiguous columns can be referenced without the :: operator.

Santhosh

-----Original Message-----
From: Dmitriy Ryaboy [mailto:dvryaboy@gmail.com] 
Sent: Tuesday, December 07, 2010 9:49 AM
To: user@pig.apache.org
Subject: Re: Easy question...difference between this::form and this.form?

Consider self-joins, with regards to the meaningful name problem...

The sql way to deal with this issue is essentially to keep the name of the parent relation
around during parsing, and require that you explicitly provide the desired parent if column
names are ambiguous. That's probably something that could be implemented now that we have
the required metadata in the operators (I believe it wasn't there when the disambiguation
design was implemented).

As far as difference between "::" and ".".  The double-colon is just a string with no special
meaning, it's simply part of the field name. The period is essentially a projection operator
-- you are saying, "the thing to the left of the period is a tuple, and the thing to the right
is a field in that tuple". (works for bags as well, in which case it means, the thing to the
left of the period is a bag of tuples, and the thing to the right is a field in every tuple
in the bag)

-Dmitriy.

2010/12/7 Anze <anzenews@volja.net>

>
> If one uses meaningful names then Pig would never use '::' anyway. The 
> problem is when you use multiple joins in sequence, then '::' names 
> get very annoying.
> But that's just my opinion. :)
>
> Anze
>
>
> On Tuesday 07 December 2010, Jonathan Coveney wrote:
> > Would that even be much better? It seems like it'd be better to have 
> > it
> be
> > consistent in appending the whatever::, so that at least you have to 
> > be cognizant of it when you do the join. If it starts being too 
> > clever, then it's up to you to figure out when it does and doesn't 
> > do it which might
> be
> > annoying.
> >
> > 2010/12/7 Anze <anzenews@volja.net>
> >
> > > I understand the reason for this, it just seems like a drastic
> solution.
> > > :)
> > >
> > > Ideally, Pig should be clever enough to detect ambiguity and deal 
> > > with it, and leave the non-conflicting names intact. For instance:
> > >
> > > A = load 'foo' as (x, y, z);
> > > B = load 'bar' as (x, a, b, c);
> > > C = join A by x, B by x;
> > > DESCRIBE C;
> > > C: {A::x, y, z, B::x, a, b, c}
> > >
> > > or even:
> > > C: {x, y, z, B::x, a, b, c}
> > >
> > > or even a step further, in case of JOIN:
> > > C: {x, y, z, a, b, c}
> > > (since join *joins* by x, why would there be two? This doesn't 
> > > always work for other operations, of course)
> > >
> > > Reasoning: at least in my cases the names are descriptive from the
> start,
> > > therefore there are almost no name conflicts. In rare cases where 
> > > there are Pig can determine that and use old syntax with "::", 
> > > then let me deal with it.
> > >
> > > I know this is backwards-incompatible change and is not likely to 
> > > be accepted, but still... :)
> > >
> > > Anze
> > >
> > > On Monday 06 December 2010, Alan Gates wrote:
> > > > The reason it's needed is that ambiguities would result otherwise.
> > > >
> > > > A = load 'foo' as (x, y, z);
> > > > B = load 'bar' as (w, x, y, z);
> > > > C = join A by x, B by x;
> > > > D = filter C by z > 0;  -- which z?
> > > >
> > > > As long as the name is not ambiguous, the :: is not required.  
> > > > So in the above example it would be perfectly legal to say
> > > >
> > > > D = filter C by w > 0;
> > > >
> > > > Out of curiosity, why do you want to remove the :: names?
> > > >
> > > > Alan.
> > > >
> > > > On Dec 6, 2010, at 1:05 PM, Jonathan Coveney wrote:
> > > > > Hijack away. I would be curious as to the reason we need this 
> > > > > as well.
> > > > >
> > > > > 2010/12/6 Anze <anzenews@volja.net>
> > > > >
> > > > >> Sorry to hijack your question, Jonathan, but while we are at
it...
> > > > >> :)
> > > > >>
> > > > >> Is there a way to tell Pig NOT to add "base_alias::"? Almost

> > > > >> half my code consists of FOREACH... GENERATE that just remove

> > > > >> these prefixes.
> > > > >>
> > > > >> Thanks,
> > > > >>
> > > > >> Anze
> > > > >>
> > > > >> On Monday 06 December 2010, Daniel Dai wrote:
> > > > >>> After join, cross, foreach flatten, Pig will automatically

> > > > >>> add "base_alias::" prefix. All other cases use "."
> > > > >>>
> > > > >>> Daniel
> > > > >>>
> > > > >>> Jonathan Coveney wrote:
> > > > >>>> It's very hard to search for this among the docs because

> > > > >>>> it's so
> > > > >>
> > > > >> generic,
> > > > >>
> > > > >>>> so I thought I'd ask... I'm sure the answer is painfully
easy.
> > > > >>>>
> > > > >>>> Taking a look at this code that I found online, for example
> > > > >>>>
> > > > >>>> --
> > > > >>>> -- Read in a bag of tuples (timeseries for this example)

> > > > >>>> and divide the
> > > > >>>> -- numeric column by its maximum.
> > > > >>>> --
> > > > >>>> %default DATABAG 'data/timeseries.tsv'
> > > > >>>>
> > > > >>>> data       = LOAD '$DATABAG' AS (month:chararray, count:int);
> > > > >>>> accumulate = GROUP data ALL;
> > > > >>>> calc_max   = FOREACH accumulate GENERATE FLATTEN(data),
> > > > >>>> MAX(data.count) AS max_count; normalize  = FOREACH calc_max

> > > > >>>> GENERATE data::month AS month, data::count AS count,

> > > > >>>> (float)data::count / (float)max_count AS normed_count;
DUMP 
> > > > >>>> normalize;
> > > > >>>>
> > > > >>>> What purpose does data::month serve versus data.count?
> > > > >>>>
> > > > >>>> Thanks
>
>

Mime
View raw message