hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (HADOOP-765) Hadoop Streaming should (optionally) sort on secondary key
Date Tue, 13 Nov 2007 22:53:43 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Owen O'Malley resolved HADOOP-765.

       Resolution: Duplicate
    Fix Version/s: 0.13.0
         Assignee:     (was: Sanjay Dahiya)

This was fixed by HADOOP-1284.

> Hadoop Streaming should (optionally) sort on secondary key
> ----------------------------------------------------------
>                 Key: HADOOP-765
>                 URL: https://issues.apache.org/jira/browse/HADOOP-765
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/streaming
>            Reporter: arkady borkovsky
>             Fix For: 0.13.0
> This is related to HADOOP-485
> As described in HADOOP-485 and HADOOP-686,  many algorithms need the values to come in
specific order.  
> (The most prominent is JOIN : in MapReduce implementation of JOIN, the value has to indicate
which "table" the record comes from.  It is very useful to have records from the smaller "table"
to come first.)
> (a) once HADOOP-485 is implemented, it should be propagated to Streaming so that sorting
by secondary is done without writing any code, but just with specifying a parameter.
> (b) alternatively, as Hadoop Streaming records are lines of text with key(s) separated
from the value by a tab, a simple hack of running a sort on the MERGED input of reduce will
work fine.   This may be quite efficient and easy way to implement this important feature
without relying on  HADOOP-485.   

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message