hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-531) Need to sort on more than the primary key
Date Wed, 13 Sep 2006 21:55:23 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-531?page=comments#action_12434552 ] 
Doug Cutting commented on HADOOP-531:

The two ideas that have been discussed related to this are:

1. Making reduce value iterators cloneable.
2. Permitting one to specify a different comparator for sorting than for dividing keys when

Are there issues in Jira for either of these yet?

> Need to sort on more than the primary key
> -----------------------------------------
>                 Key: HADOOP-531
>                 URL: http://issues.apache.org/jira/browse/HADOOP-531
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/streaming
>    Affects Versions: 0.5.0
>            Reporter: Richard Kasperski
> There are many tasks where I need to have finer control over the ordering in the reduce
than a sort on a single key provides. Most of these situations arise when a merge two sources
of data and am attaching a single instance of one source to multiple instances of a second
source. I know that I can read all the the records with a single key. It's possible that there
might be many millions of these making memory demands that cannot be satisfied.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message