hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Reed (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-40) Memory management in BigDataBag is probably wrong
Date Fri, 30 Nov 2007 20:52:44 GMT

    [ https://issues.apache.org/jira/browse/PIG-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547283
] 

Benjamin Reed commented on PIG-40:
----------------------------------

We aren't really doing memory management. We just need to decide when to spill a bag to disk.
We can't just count the elements of a bag since elements can be of different size. We also
need to spill earlier if memory is constrained.

Using freeMemory may cause us to spill before we need to, but the important thing is that
we make sure to spill when memory is constrained. There doesn't seem to be a better way to
do it.

> Memory management in BigDataBag is probably wrong
> -------------------------------------------------
>
>                 Key: PIG-40
>                 URL: https://issues.apache.org/jira/browse/PIG-40
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Sam Pullara
>
> src/org/apache/pig/data/BigDataBag.java
> 1) You should not use finalizers for things other than external resources -- using them
here is very dangerous and could inadvertantly lead to deadlocks and object resurrection and
just decreases performance without any advantage.
> 2) Using .freeMemory() the way it is used in this class is broken.  freeMemory() is going
to return a mostly random number between 0 and the real amount.  Adding gc() in here is a
terrible performance burden.  If you really want to do something like this you should using
softreferences and finalization queues.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message