hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pi Song (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-167) Experiment : A proper bag memory manager.
Date Mon, 24 Mar 2008 20:53:24 GMT
Experiment : A proper bag memory manager.
-----------------------------------------

                 Key: PIG-167
                 URL: https://issues.apache.org/jira/browse/PIG-167
             Project: Pig
          Issue Type: Improvement
            Reporter: Pi Song


According to PIG-164, I think we still have room for improvement:-
1) Alan said
{quote}
"It rests on the assumption that data bags generally live about the same amount of time, thus
there won't be a long lived databag at the head of the list blocking the cleaning of many
stale references later in the list."
{quote}

By looking at a line of code in SpillableMemoryManager
{noformat}
Collections.sort(spillables, new Comparator<WeakReference<Spillable>>() {
{noformat}

- Alan's assumption might be wrong after the memory manager tries to spill the list.
- I don't understand why this has to be sorted and start spilling from the smallest bags first.
Most file systems are not good at handling small files (specially ext2/ext3).

2) We use a linkedlist to maintain WeakReference. Normally a linkedlist consumes double as
much memory that an array would consume(for pointers). Should it be better to change LinkedList
to Array or ArrayList?

3) In SpillableMemoryManager, handleNotification which does a kind of I/O intensive job shares
the same lock with registerSpillable. This doesn't seem to be efficient.

4) Sometimes I recognized that the bag currently in use got spilled and read back over and
over again. Essentially, the memory manager should consider spilling bags currently not in
use first.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message