hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pi Song (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-167) Experiment : A proper bag memory manager.
Date Tue, 25 Mar 2008 14:25:24 GMT

    [ https://issues.apache.org/jira/browse/PIG-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581954#action_12581954
] 

Pi Song commented on PIG-167:
-----------------------------

I would like to introduce a simpler (but more efficient) design.

Here I've got a LinkedList of ArrayLists holding WeakReferences:-
{noformat}
Reclaim memory here first
    V
[ArrayList] - > [ArrayList] - >[ArrayList] - > [ArrayList]    <=== Register new
spillables here
{noformat}

- N is the number of spillables per ArrayList(Node)
- The LinkedList grows at the tail.
- Reclaiming memory is done at the head first and does clean the whole node. Spillables that
are not null yet are migrated to the LinkedList tail. Then the whole ArrayList(Node) will
be thrown away.
- Reclaiming keeps cleaning next node if all refs in the current node are null.
- Reclaiming can be activated in two ways 1)By MXBean 2)When register counter hits the threshold
(We maintain this counter. It is reset once we reclaim).

Pros:-
- Guarantee old bags are clean-up first.
- Reduce memory usage by half for maintaining references compared to the existing one
- Less overhead in maintaining reference list (No clean-up every register. No non-linear operation
(sort). Always one pass over the list)
- From initial tests, it is slightly faster than Alan's fix (Haven't tried Ben's new fix)

Important Facts:-
- ArrayList is as good as LinkedList in .add() in Java (after taking allocating new array
into consideration)

> Experiment : A proper bag memory manager.
> -----------------------------------------
>
>                 Key: PIG-167
>                 URL: https://issues.apache.org/jira/browse/PIG-167
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Pi Song
>         Attachments: MemManager0.patch
>
>
> According to PIG-164, I think we still have room for improvement:-
> 1) Alan said
> {quote}
> "It rests on the assumption that data bags generally live about the same amount of time,
thus there won't be a long lived databag at the head of the list blocking the cleaning of
many stale references later in the list."
> {quote}
> By looking at a line of code in SpillableMemoryManager
> {noformat}
> Collections.sort(spillables, new Comparator<WeakReference<Spillable>>() {
> {noformat}
> - Alan's assumption might be wrong after the memory manager tries to spill the list.
> - I don't understand why this has to be sorted and start spilling from the smallest bags
first. Most file systems are not good at handling small files (specially ext2/ext3).
> 2) We use a linkedlist to maintain WeakReference. Normally a linkedlist consumes double
as much memory that an array would consume(for pointers). Should it be better to change LinkedList
to Array or ArrayList?
> 3) In SpillableMemoryManager, handleNotification which does a kind of I/O intensive job
shares the same lock with registerSpillable. This doesn't seem to be efficient.
> 4) Sometimes I recognized that the bag currently in use got spilled and read back over
and over again. Essentially, the memory manager should consider spilling bags currently not
in use first.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message