pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Ryaboy <dvrya...@gmail.com>
Subject Re: Scalar problem
Date Mon, 09 Apr 2012 17:21:09 GMT
Alan, which idea are you +1 on? I think (int) D is the current syntax.

There are a couple problems that people hit in the current scalar
implementation, both of which I think can be fixed without introducing
new syntax:

1) Require the cast, don't do it implicitly. This was actually in the
design doc but didn't get implemented for some reason.

2) Throw an error on the frontend if the scalar relation is the
relation being iterated on. Meaning:

foreach foo generate (int) foo.id; -- this will cause the second "foo"
to be interpreted as a scalar invocation, although clearly it's just a
bug, and the programmer mean to say "generate (int) id"

We can just detect this error case and throw during compilation.

3) Improve MR-side logging to make it clear that a relation is being
loaded from the side, what the relation is, etc.

I believe we have jiras open for all of these..


On Mon, Apr 9, 2012 at 10:15 AM, Alan Gates <gates@hortonworks.com> wrote:
> I'm +1 on this idea, since it's been a problem since the beginning.  Why not use regular
casting notation though, rather than develop another notation?  That's what we discussed
originally when we were deciding whether to require casting or do it silently.  So instead
of D->a or SCALAR(D) it would be (int)D.
> Alan.
> On Apr 8, 2012, at 7:42 AM, Jonathan Coveney wrote:
>> I like this idea, and I think we should deprecate the old syntax, and we
>> can discuss later when it'd get deleted (and when that would be worth it...
>> if we have a new syntax, it seems pretty painless to have the other one
>> float around for backwards compatibility, and if anyone uses it it's a sort
>> of "caveat emptor").
>> 2012/4/8 Aniket Mokashi <aniket486@gmail.com>
>>> Hi,
>>> I have noticed early users of pig often hit issues because of confusing
>>> syntax between scalars and projections. I think scalar syntax should be
>>> made more explicit for users to use in order to avoid these problems. For
>>> example- D = foreach C generate B->count; etc.
>>> I am sure we might break some backward compatibility but we can at least
>>> deprecate the syntax for a few versions and eventually move to new syntax.
>>> Thoughts?
>>> Thanks,
>>> Aniket

View raw message