hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-346) Generic 'Sort' Infrastructure for Map-Reduce framework.
Date Thu, 06 Jul 2006 05:57:31 GMT
Generic 'Sort' Infrastructure for Map-Reduce framework.
-------------------------------------------------------

         Key: HADOOP-346
         URL: http://issues.apache.org/jira/browse/HADOOP-346
     Project: Hadoop
        Type: New Feature

  Components: mapred  
    Reporter: Arun C Murthy
 Assigned to: Arun C Murthy 


It would be useful to add a generic *sort* infrastructure to the Map-Reduce framework to ease
usage.
Specifically the idea to add a fairly generic and powerful *comparator* which can be configured
by the user to meet his specific needs.

Spec:
--------
 
  The proposal is to model generic (uber) comparator along the lines of the the standard unix
*sort* command. The comparator provides the following (configurable) functionality:

  a) Separator for breaking up the data (stream) into 'columns'.
  b) Multiple key ranges for specifying priorities of 'columns'. (ala --keys/-k option of
unix sort i.e. -k 2,3 -k 1,4 etc.)
  c) A variant of a) to let user specify byte range-boundaries without using a separator for
'columns'.
  d) Option to sort 'reverse'.
  e) Option to do a 'stable' sort i.e. don't do a last-ditch comparision of all bytes if all
key ranges match.
  f) Option to do 'numeric' comparisions instead of lexicographical comparisions?

  Of course all these are optional with the default behaviour as-is today.

     - * - * -

 Anything more/less?

thanks,
Arun


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message