hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pi Song (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-167) Experiment : A proper bag memory manager.
Date Fri, 28 Mar 2008 16:04:24 GMT

     [ https://issues.apache.org/jira/browse/PIG-167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Pi Song updated PIG-167:
------------------------

    Attachment: memoryManagerV3.patch

This one is generational + collection-by-copy + spill big objects first.
Theoretically should be very suitable for MapReduce. 
I will have to find a proof for it!!!

> Experiment : A proper bag memory manager.
> -----------------------------------------
>
>                 Key: PIG-167
>                 URL: https://issues.apache.org/jira/browse/PIG-167
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Pi Song
>         Attachments: MemManager0.patch, MemManager1.patch, memoryManagerV3.patch
>
>
> According to PIG-164, I think we still have room for improvement:-
> 1) Alan said
> {quote}
> "It rests on the assumption that data bags generally live about the same amount of time,
thus there won't be a long lived databag at the head of the list blocking the cleaning of
many stale references later in the list."
> {quote}
> By looking at a line of code in SpillableMemoryManager
> {noformat}
> Collections.sort(spillables, new Comparator<WeakReference<Spillable>>() {
> {noformat}
> - Alan's assumption might be wrong after the memory manager tries to spill the list.
> - I don't understand why this has to be sorted and start spilling from the smallest bags
first. Most file systems are not good at handling small files (specially ext2/ext3).
> 2) We use a linkedlist to maintain WeakReference. Normally a linkedlist consumes double
as much memory that an array would consume(for pointers). Should it be better to change LinkedList
to Array or ArrayList?
> 3) In SpillableMemoryManager, handleNotification which does a kind of I/O intensive job
shares the same lock with registerSpillable. This doesn't seem to be efficient.
> 4) Sometimes I recognized that the bag currently in use got spilled and read back over
and over again. Essentially, the memory manager should consider spilling bags currently not
in use first.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message