pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anze <anzen...@volja.net>
Subject Re: Easy question...difference between this::form and this.form?
Date Tue, 07 Dec 2010 17:16:45 GMT

If one uses meaningful names then Pig would never use '::' anyway. The problem 
is when you use multiple joins in sequence, then '::' names get very annoying. 
But that's just my opinion. :)

Anze


On Tuesday 07 December 2010, Jonathan Coveney wrote:
> Would that even be much better? It seems like it'd be better to have it be
> consistent in appending the whatever::, so that at least you have to be
> cognizant of it when you do the join. If it starts being too clever, then
> it's up to you to figure out when it does and doesn't do it which might be
> annoying.
> 
> 2010/12/7 Anze <anzenews@volja.net>
> 
> > I understand the reason for this, it just seems like a drastic solution.
> > :)
> > 
> > Ideally, Pig should be clever enough to detect ambiguity and deal with
> > it, and
> > leave the non-conflicting names intact. For instance:
> > 
> > A = load 'foo' as (x, y, z);
> > B = load 'bar' as (x, a, b, c);
> > C = join A by x, B by x;
> > DESCRIBE C;
> > C: {A::x, y, z, B::x, a, b, c}
> > 
> > or even:
> > C: {x, y, z, B::x, a, b, c}
> > 
> > or even a step further, in case of JOIN:
> > C: {x, y, z, a, b, c}
> > (since join *joins* by x, why would there be two? This doesn't always
> > work for
> > other operations, of course)
> > 
> > Reasoning: at least in my cases the names are descriptive from the start,
> > therefore there are almost no name conflicts. In rare cases where there
> > are Pig can determine that and use old syntax with "::", then let me
> > deal with it.
> > 
> > I know this is backwards-incompatible change and is not likely to be
> > accepted,
> > but still... :)
> > 
> > Anze
> > 
> > On Monday 06 December 2010, Alan Gates wrote:
> > > The reason it's needed is that ambiguities would result otherwise.
> > > 
> > > A = load 'foo' as (x, y, z);
> > > B = load 'bar' as (w, x, y, z);
> > > C = join A by x, B by x;
> > > D = filter C by z > 0;  -- which z?
> > > 
> > > As long as the name is not ambiguous, the :: is not required.  So in
> > > the above example it would be perfectly legal to say
> > > 
> > > D = filter C by w > 0;
> > > 
> > > Out of curiosity, why do you want to remove the :: names?
> > > 
> > > Alan.
> > > 
> > > On Dec 6, 2010, at 1:05 PM, Jonathan Coveney wrote:
> > > > Hijack away. I would be curious as to the reason we need this as
> > > > well.
> > > > 
> > > > 2010/12/6 Anze <anzenews@volja.net>
> > > > 
> > > >> Sorry to hijack your question, Jonathan, but while we are at it...
> > > >> :)
> > > >> 
> > > >> Is there a way to tell Pig NOT to add "base_alias::"? Almost half
> > > >> my code
> > > >> consists of FOREACH... GENERATE that just remove these prefixes.
> > > >> 
> > > >> Thanks,
> > > >> 
> > > >> Anze
> > > >> 
> > > >> On Monday 06 December 2010, Daniel Dai wrote:
> > > >>> After join, cross, foreach flatten, Pig will automatically add
> > > >>> "base_alias::" prefix. All other cases use "."
> > > >>> 
> > > >>> Daniel
> > > >>> 
> > > >>> Jonathan Coveney wrote:
> > > >>>> It's very hard to search for this among the docs because it's
so
> > > >> 
> > > >> generic,
> > > >> 
> > > >>>> so I thought I'd ask... I'm sure the answer is painfully easy.
> > > >>>> 
> > > >>>> Taking a look at this code that I found online, for example
> > > >>>> 
> > > >>>> --
> > > >>>> -- Read in a bag of tuples (timeseries for this example) and
> > > >>>> divide the
> > > >>>> -- numeric column by its maximum.
> > > >>>> --
> > > >>>> %default DATABAG 'data/timeseries.tsv'
> > > >>>> 
> > > >>>> data       = LOAD '$DATABAG' AS (month:chararray, count:int);
> > > >>>> accumulate = GROUP data ALL;
> > > >>>> calc_max   = FOREACH accumulate GENERATE FLATTEN(data),
> > > >>>> MAX(data.count) AS max_count;
> > > >>>> normalize  = FOREACH calc_max GENERATE data::month AS month,
> > > >>>> data::count AS count, (float)data::count / (float)max_count
AS
> > > >>>> normed_count;
> > > >>>> DUMP normalize;
> > > >>>> 
> > > >>>> What purpose does data::month serve versus data.count?
> > > >>>> 
> > > >>>> Thanks


Mime
View raw message