hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3366) Shuffle/Merge improvements
Date Tue, 13 May 2008 12:56:55 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596385#action_12596385
] 

Devaraj Das commented on HADOOP-3366:
-------------------------------------

I agree with 1 through 3.

bq. 4. Throw away RamFS, implement a simple manager who returns byte-arrays of a given size
(i.e. decompressed shuffle split) until it runs out of the amount of memory available.

I am not sure this is justified. I'd propose

1) Make the InMemoryFileSystem independent of the CheckSumFileSystem
2) Implement special DataOutputBuffer/ValueBytes for the ramfs. The DataOutputBuffer gives
us a nice abstraction to look at data, be it from files or memory. I think we should retain
that abstraction and handle the ramfs as a special case.

We already use raw comparators. Not sure what you meant by this.

I'll submit a patch with some of the above thoughts implemented in a bit.


> Shuffle/Merge improvements
> --------------------------
>
>                 Key: HADOOP-3366
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3366
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.18.0
>
>
> This is intended to be a meta-issue to track various improvements to shuffle/merge in
the reducer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message