hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich" <ol...@yahoo-inc.com>
Subject RE: Propsoal for handling "GC overhead limit" errors
Date Mon, 09 Jun 2008 22:01:17 GMT
Pradeep,

Have you tested this? If so,

(1) Did the problem go away for the queries you tested?
(2) What effect did it have on the performance of the queries that run
successfully and spill.

Thanks,

Olga
 

> -----Original Message-----
> From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com] 
> Sent: Monday, June 09, 2008 2:32 PM
> To: pig-dev@incubator.apache.org
> Subject: Propsoal for handling "GC overhead limit" errors
> 
> Hi,
> 
>  
> 
> Currently in org.apache.pig.impl.util.SpillableMemoryManger:
> 
>  
> 
> 1) We use MemoryManagement interface to get notified when the 
> "collection threshold" exceeds a limit (we set this to 
> biggest_heap*0.5). With this in place we are still seeing "GC 
> overhead limit" issues when trying large dataset operations. 
> Observing some runs, it looks like the notification is not 
> frequent enough and early enough to prevent memory issues 
> possibly because this notification only occurs after GC.
> 
>  
> 
> 2) We only attempt to free upto :
> 
> long toFree = info.getUsage().getUsed() - 
> (long)(info.getUsage().getMax()*.5);
> 
> This is only the excess amount over the threshold which 
> caused the notification and is not sufficient to not be 
> called again soon.
> 
>  
> 
> 3) While iterating over spillables, if current spillable's 
> memory size is > gcActivationSize, we try to invoke System.gc
> 
>  
> 
> 4) We *always* invoke System.gc() after iterating over spillables
> 
>  
> 
> Proposed changes are:
> 
> =================
> 
> 1) In addition to "collection threshold" of biggest_heap*0.5, 
> a "usage threshold" of biggest_heap*0.7 will be used so we 
> get notified early and often irrespective of whether garbage 
> collection has occured.
> 
>  
> 
> 2) We will attempt to free 
> 
> toFree = info.getUsage().getUsed() - threshold + 
> (long)(threshold * 0.5); where threshold is 
> (info.getUsage().getMax() * 0.7) if the
> handleNotification() method is handling a "usage threshold exceeded"
> notification and (info.getUsage().getMax() * 0.5) otherwise 
> ("collection threshold exceeded" case)
> 
>  
> 
> 3) While iterating over spillables, if the *memory freed thus 
> far* is > gcActivationSize OR if we have freed sufficient 
> memory (based on 2) above), then we set a flag to invoke 
> System.gc when we exit the loop.  
> 
>  
> 
> 4) We will invoke System.gc() only if the flag is set in 3) above
> 
>  
> 
> Please provide thoughts/comments.
> 
>  
> 
> Thanks,
> 
> Pradeep
> 
> 

Mime
View raw message