pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anze <anzen...@volja.net>
Subject Re: Easy question...difference between this::form and this.form?
Date Tue, 07 Dec 2010 08:44:40 GMT

I understand the reason for this, it just seems like a drastic solution. :)

Ideally, Pig should be clever enough to detect ambiguity and deal with it, and 
leave the non-conflicting names intact. For instance:

A = load 'foo' as (x, y, z);
B = load 'bar' as (x, a, b, c);
C = join A by x, B by x;
DESCRIBE C;
C: {A::x, y, z, B::x, a, b, c}

or even:
C: {x, y, z, B::x, a, b, c}

or even a step further, in case of JOIN: 
C: {x, y, z, a, b, c}
(since join *joins* by x, why would there be two? This doesn't always work for 
other operations, of course)

Reasoning: at least in my cases the names are descriptive from the start, 
therefore there are almost no name conflicts. In rare cases where there are 
Pig can determine that and use old syntax with "::", then let me deal with it.

I know this is backwards-incompatible change and is not likely to be accepted, 
but still... :)

Anze


On Monday 06 December 2010, Alan Gates wrote:
> The reason it's needed is that ambiguities would result otherwise.
> 
> A = load 'foo' as (x, y, z);
> B = load 'bar' as (w, x, y, z);
> C = join A by x, B by x;
> D = filter C by z > 0;  -- which z?
> 
> As long as the name is not ambiguous, the :: is not required.  So in
> the above example it would be perfectly legal to say
> 
> D = filter C by w > 0;
> 
> Out of curiosity, why do you want to remove the :: names?
> 
> Alan.
> 
> On Dec 6, 2010, at 1:05 PM, Jonathan Coveney wrote:
> > Hijack away. I would be curious as to the reason we need this as well.
> > 
> > 2010/12/6 Anze <anzenews@volja.net>
> > 
> >> Sorry to hijack your question, Jonathan, but while we are at it... :)
> >> 
> >> Is there a way to tell Pig NOT to add "base_alias::"? Almost half
> >> my code
> >> consists of FOREACH... GENERATE that just remove these prefixes.
> >> 
> >> Thanks,
> >> 
> >> Anze
> >> 
> >> On Monday 06 December 2010, Daniel Dai wrote:
> >>> After join, cross, foreach flatten, Pig will automatically add
> >>> "base_alias::" prefix. All other cases use "."
> >>> 
> >>> Daniel
> >>> 
> >>> Jonathan Coveney wrote:
> >>>> It's very hard to search for this among the docs because it's so
> >> 
> >> generic,
> >> 
> >>>> so I thought I'd ask... I'm sure the answer is painfully easy.
> >>>> 
> >>>> Taking a look at this code that I found online, for example
> >>>> 
> >>>> --
> >>>> -- Read in a bag of tuples (timeseries for this example) and
> >>>> divide the
> >>>> -- numeric column by its maximum.
> >>>> --
> >>>> %default DATABAG 'data/timeseries.tsv'
> >>>> 
> >>>> data       = LOAD '$DATABAG' AS (month:chararray, count:int);
> >>>> accumulate = GROUP data ALL;
> >>>> calc_max   = FOREACH accumulate GENERATE FLATTEN(data),
> >>>> MAX(data.count) AS max_count;
> >>>> normalize  = FOREACH calc_max GENERATE data::month AS month,
> >>>> data::count AS count, (float)data::count / (float)max_count AS
> >>>> normed_count;
> >>>> DUMP normalize;
> >>>> 
> >>>> What purpose does data::month serve versus data.count?
> >>>> 
> >>>> Thanks


Mime
View raw message