hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mariappan Asokan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-2454) Allow external sorter plugin for MR
Date Thu, 05 May 2011 14:53:03 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029349#comment-13029349

Mariappan Asokan commented on MAPREDUCE-2454:

Hi Steve,
    Thank you very much for your comments.  I will try to make the sorting done on Map and
Reduce side as pluggable.  The default implementation will be whatever is available in the
framework.  It is easy to separate the sorting process on the Map side(currently all the code
is in the class MapOutputBuffer which lives in MapTask.java.)  It is very hard to separate
the merge on the Reduce side because of the way it is coded.  I am working to separate that
as well.

Regarding GNU sort plugin, I am making the external sort command name configurable.  It can
be POSIX sort command as well.  Since most Hadoop installations are Linux based, GNU sort
is available as the POSIX sort implementation.  Other UNIX installations can use the POSIX
sort command as an external sorter.  There is no GPL issue.  Perhaps, I can remove the word
GNU and just call it UNIX.

Regarding class loader related exceptions: I will look at framework's code and see what it
does when it loads a Mapper or Reducer class and follow the same since the scenario is very
similar.  All issues you have raised w.r.t class loading are applicable there as well.

An explanation on UnsupportedOperationException:  If the external sorter uses a UNIX command
like sort, it may not be able to handle a custom key type user has defined since the key comparator
may be written in Java.  In such a case there will be message logged in syslog and the framework's
sorter will be used.  I think this is fair enough.  Please let me know if you think otherwise.

When I am done with the implementation(on top of MAPREDUCE-279) and testing, I will post a
patch file for review.  Would you be interested to work with me as a committer?

Thank you.

> Allow external sorter plugin for MR
> -----------------------------------
>                 Key: MAPREDUCE-2454
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Mariappan Asokan
>            Priority: Minor
>         Attachments: KeyValueIterator.java, MapOutputSorter.java, MapOutputSorterAbstract.java,
> Define interfaces and some abstract classes in the Hadoop framework to facilitate external
sorter plugins both on the Map and Reduce sides.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message