hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gates <ga...@yahoo-inc.com>
Subject Re: Pig performance
Date Wed, 31 Dec 2008 21:13:55 GMT
This will definitely be done after the merge of types to trunk.  As  
for PIG-273, the changes we need to make are larger than just that.   
Consider, for example:

A = load ...
B = filter ...
store B into 'bla';
C = group B by $0;
...

There's no split explicitly in there, but pig should be able to tee  
the input at the 'store B' and keep going.  So PIG-273 is part of it,  
but I imagine when we start working on it there'll be another JIRA to  
track all the changes, of which PIG-273 will become a sub-task.

Alan.

On Dec 30, 2008, at 12:48 AM, Kevin Weil wrote:

> Hi Olga,
>
> I am eagerly awaiting not having to re-read all data each time I  
> store part
> of a split!  As far as timelines go, I imagine this will be a  
> larger fix
> that will come in after the merge from types -> trunk?  And is
> Pig-273<https://issues.apache.org/jira/browse/PIG-273>the proper bug
> for tracking this issue?
>
> Thanks,
> Kevin
>
> On Mon, Dec 22, 2008 at 10:22 AM, Olga Natkovich <olgan@yahoo- 
> inc.com>wrote:
>
>> The reason trunk does not contain the latest code is that Pig has
>> undergone a complete redesign that we could not do incrementally  
>> on the
>> trunk without jeopardizing its stability. The decision was made to do
>> the work on a brunch and then merge branch code to the trunk when  
>> it is
>> stable.
>>
>> The merging will be happening in the early January.
>>
>> The second comment that Alan made is that we are about to start  
>> work on
>> cross query optimization - ability to combine computations across
>> multiple stores.
>>
>> Olga
>>
>>> -----Original Message-----
>>> From: Ted Dunning [mailto:ted.dunning@gmail.com]
>>> Sent: Saturday, December 20, 2008 10:33 AM
>>> To: pig-dev@hadoop.apache.org
>>> Cc: pig-dev@hadoop.apache.org
>>> Subject: Re: Pig performance
>>>
>>>
>>> I think the key points that Alan brought up in his blog
>>> comment were that trunk pig is paradoxically not the most
>>> current and that storing intermediate results can decrease
>>> the scope of optimizations.
>>>
>>> On Dec 20, 2008, at 10:16, Alan Gates <gates@yahoo-inc.com> wrote:
>>>
>>>> I left a comment on the blog addressing some of the issues
>>> he brought
>>>> up.
>>>>
>>>> Alan.
>>>>
>>>> On Dec 20, 2008, at 1:00 AM, Jeff Hammerbacher wrote:
>>>>
>>>>> Hey Pig team,
>>>>>
>>>>> Did anyone check out the recent claims about Pig's poor
>>> performance
>>>>> versus Cascading? Though I haven't worked extensively with either
>>>>> system, I found the statements made fairly bold and am curious to
>>>>> hear more about their validity from the Pig development team:
>>>>>
>>> http://www.manamplified.org/archives/2008/12/cascading-and-pig- 
>>> planne
>>>>> rs.html
>>>>> .
>>>>>
>>>>> Thanks,
>>>>> Jeff
>>>>
>>>
>>


Mime
View raw message