pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Coveney <jcove...@gmail.com>
Subject Re: Easy question...difference between this::form and this.form?
Date Tue, 07 Dec 2010 15:10:14 GMT
Would that even be much better? It seems like it'd be better to have it be
consistent in appending the whatever::, so that at least you have to be
cognizant of it when you do the join. If it starts being too clever, then
it's up to you to figure out when it does and doesn't do it which might be
annoying.

2010/12/7 Anze <anzenews@volja.net>

>
> I understand the reason for this, it just seems like a drastic solution. :)
>
> Ideally, Pig should be clever enough to detect ambiguity and deal with it,
> and
> leave the non-conflicting names intact. For instance:
>
> A = load 'foo' as (x, y, z);
> B = load 'bar' as (x, a, b, c);
> C = join A by x, B by x;
> DESCRIBE C;
> C: {A::x, y, z, B::x, a, b, c}
>
> or even:
> C: {x, y, z, B::x, a, b, c}
>
> or even a step further, in case of JOIN:
> C: {x, y, z, a, b, c}
> (since join *joins* by x, why would there be two? This doesn't always work
> for
> other operations, of course)
>
> Reasoning: at least in my cases the names are descriptive from the start,
> therefore there are almost no name conflicts. In rare cases where there are
> Pig can determine that and use old syntax with "::", then let me deal with
> it.
>
> I know this is backwards-incompatible change and is not likely to be
> accepted,
> but still... :)
>
> Anze
>
>
> On Monday 06 December 2010, Alan Gates wrote:
> > The reason it's needed is that ambiguities would result otherwise.
> >
> > A = load 'foo' as (x, y, z);
> > B = load 'bar' as (w, x, y, z);
> > C = join A by x, B by x;
> > D = filter C by z > 0;  -- which z?
> >
> > As long as the name is not ambiguous, the :: is not required.  So in
> > the above example it would be perfectly legal to say
> >
> > D = filter C by w > 0;
> >
> > Out of curiosity, why do you want to remove the :: names?
> >
> > Alan.
> >
> > On Dec 6, 2010, at 1:05 PM, Jonathan Coveney wrote:
> > > Hijack away. I would be curious as to the reason we need this as well.
> > >
> > > 2010/12/6 Anze <anzenews@volja.net>
> > >
> > >> Sorry to hijack your question, Jonathan, but while we are at it... :)
> > >>
> > >> Is there a way to tell Pig NOT to add "base_alias::"? Almost half
> > >> my code
> > >> consists of FOREACH... GENERATE that just remove these prefixes.
> > >>
> > >> Thanks,
> > >>
> > >> Anze
> > >>
> > >> On Monday 06 December 2010, Daniel Dai wrote:
> > >>> After join, cross, foreach flatten, Pig will automatically add
> > >>> "base_alias::" prefix. All other cases use "."
> > >>>
> > >>> Daniel
> > >>>
> > >>> Jonathan Coveney wrote:
> > >>>> It's very hard to search for this among the docs because it's so
> > >>
> > >> generic,
> > >>
> > >>>> so I thought I'd ask... I'm sure the answer is painfully easy.
> > >>>>
> > >>>> Taking a look at this code that I found online, for example
> > >>>>
> > >>>> --
> > >>>> -- Read in a bag of tuples (timeseries for this example) and
> > >>>> divide the
> > >>>> -- numeric column by its maximum.
> > >>>> --
> > >>>> %default DATABAG 'data/timeseries.tsv'
> > >>>>
> > >>>> data       = LOAD '$DATABAG' AS (month:chararray, count:int);
> > >>>> accumulate = GROUP data ALL;
> > >>>> calc_max   = FOREACH accumulate GENERATE FLATTEN(data),
> > >>>> MAX(data.count) AS max_count;
> > >>>> normalize  = FOREACH calc_max GENERATE data::month AS month,
> > >>>> data::count AS count, (float)data::count / (float)max_count AS
> > >>>> normed_count;
> > >>>> DUMP normalize;
> > >>>>
> > >>>> What purpose does data::month serve versus data.count?
> > >>>>
> > >>>> Thanks
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message