hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sam Pullara (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-40) Memory management in BigDataBag is probably wrong
Date Mon, 03 Dec 2007 17:47:43 GMT

    [ https://issues.apache.org/jira/browse/PIG-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547917
] 

Sam Pullara commented on PIG-40:
--------------------------------

The particular notification that I use in my example is evaluated right after a garbage collection
occurs, it should be very accurate.

1) If you want to keep in the old code for a case where you find a VM that doesn't support
that seems fine to me.  That said, it worked on Mac (Sun & soylatte) and Linux (Sun &
Jrockit).  Haven't tested it on Windows but I can't imagine Sun failed to implement it there.
2) Rechecking periodically doesn't seem that bad to me.  In fact there is a polling version
of this in the javadocs: https://java.sun.com/j2se/1.5.0/docs/api/java/lang/management/MemoryPoolMXBean.html
though I would always use the 'Collected' thresholds so you don't get false positives.

I was actually pleasantly surprised to find such great support for doing this stuff in the
VM -- somehow I had missed it in the release notes or at least forgot to check it out.

> Memory management in BigDataBag is probably wrong
> -------------------------------------------------
>
>                 Key: PIG-40
>                 URL: https://issues.apache.org/jira/browse/PIG-40
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Sam Pullara
>         Attachments: BigDataBag.java, MemoryUsage.java
>
>
> src/org/apache/pig/data/BigDataBag.java
> 1) You should not use finalizers for things other than external resources -- using them
here is very dangerous and could inadvertantly lead to deadlocks and object resurrection and
just decreases performance without any advantage.
> 2) Using .freeMemory() the way it is used in this class is broken.  freeMemory() is going
to return a mostly random number between 0 and the real amount.  Adding gc() in here is a
terrible performance burden.  If you really want to do something like this you should using
softreferences and finalization queues.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message