hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gates <ga...@yahoo-inc.com>
Subject Re: SIZE() of relation
Date Tue, 15 Jun 2010 16:14:38 GMT
There have been several requests for this.  I'm not a fan of it,  
because it makes it too easy to forget that you're forcing a single  
reducer MR job to accomplish this.  But I'm open to persuasion if  
everyone else disagrees.

Alan.

On Jun 11, 2010, at 7:27 PM, Russell Jurney wrote:

> This would be great.  Save us from GROUP ALL/FOREACH, which is  
> awkward.
>
> On Fri, Jun 11, 2010 at 7:14 PM, Dmitriy Ryaboy <dvryaboy@gmail.com>  
> wrote:
>
>> It would be cool to just treat relations as bags in the general  
>> case. They
>> kind of are, and kind of are not. Causes lots of user confusion.
>> There are obvious users-doing-dumb-stuff scenarios that arise though.
>> I guess the Pig philosophy is that the user is the optimizer,  
>> though.. so
>> maybe it's ok.
>>
>> -D
>>
>> On Fri, Jun 11, 2010 at 6:42 PM, Russell Jurney <russell.jurney@gmail.com
>>> wrote:
>>
>>> Would it be possible, and not a ton of work to make the builtin  
>>> SIZE()
>> work
>>> on a relation?  Reason being, I frequently do this:
>>>
>>> B = GROUP A ALL;
>>> C = FOREACH B GENERATE SIZE(A) AS total;
>>> DUMP C;
>>>
>>> And I would rather do this:
>>>
>>> DUMP SIZE(A);
>>>
>>> Russ
>>>
>>


Mime
View raw message