hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Reed (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-167) Experiment : A proper bag memory manager.
Date Mon, 24 Mar 2008 21:51:25 GMT

    [ https://issues.apache.org/jira/browse/PIG-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581689#action_12581689
] 

Benjamin Reed commented on PIG-167:
-----------------------------------

I think you've identified a MAJOR bug. The  SpillManager should spill the biggest bags first.
Did you try running your tests again with that change?

I think you want the lock contention. In the low memory condition you don't want allocations
to continue because you might run out of memory. Waiting for a lock is much better than getting
an out of memory error.

I'm also wondering about trying to spill eden first. My intuition would that recently created
bags are more likely to be used than old bags, but I have no measurements to show that :)

Alan's assumption is correct between spilling. By cleaning from the head Alan can do some
between spill housekeeping. (The memory manager cleans up during a spill.)

Our efforts in this area are so that jobs complete successfully not so much that they perform
better. (Both would be great, but slow success is much better than quick failure.)

> Experiment : A proper bag memory manager.
> -----------------------------------------
>
>                 Key: PIG-167
>                 URL: https://issues.apache.org/jira/browse/PIG-167
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Pi Song
>         Attachments: diagram.gif, MemManager0.patch
>
>
> According to PIG-164, I think we still have room for improvement:-
> 1) Alan said
> {quote}
> "It rests on the assumption that data bags generally live about the same amount of time,
thus there won't be a long lived databag at the head of the list blocking the cleaning of
many stale references later in the list."
> {quote}
> By looking at a line of code in SpillableMemoryManager
> {noformat}
> Collections.sort(spillables, new Comparator<WeakReference<Spillable>>() {
> {noformat}
> - Alan's assumption might be wrong after the memory manager tries to spill the list.
> - I don't understand why this has to be sorted and start spilling from the smallest bags
first. Most file systems are not good at handling small files (specially ext2/ext3).
> 2) We use a linkedlist to maintain WeakReference. Normally a linkedlist consumes double
as much memory that an array would consume(for pointers). Should it be better to change LinkedList
to Array or ArrayList?
> 3) In SpillableMemoryManager, handleNotification which does a kind of I/O intensive job
shares the same lock with registerSpillable. This doesn't seem to be efficient.
> 4) Sometimes I recognized that the bag currently in use got spilled and read back over
and over again. Essentially, the memory manager should consider spilling bags currently not
in use first.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message