hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jothi Padmanabhan (JIRA)" <j...@apache.org>
Subject [jira] Assigned: (HADOOP-475) The value iterator to reduce function should be clonable
Date Mon, 16 Feb 2009 11:03:00 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jothi Padmanabhan reassigned HADOOP-475:

    Assignee: Jothi Padmanabhan  (was: Vivek Ratan)

> The value iterator to reduce function should be clonable
> --------------------------------------------------------
>                 Key: HADOOP-475
>                 URL: https://issues.apache.org/jira/browse/HADOOP-475
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Jothi Padmanabhan
> In the current framework, when the user implements the reduce method of Reducer class,

> the user can only iterate through the value iterator once. 
> This makes it hard for the user to perform join-like operations with in the reduce method.

> To address problem, one approach is to make the input value iterator clonable. Then the
user can iterate the values in different ways.
> If the iterator can be reset, then the user can perform nested iterations over the data,
> carry out join-likeoperations.
> The user code in reduce method would be something like:
>                   iterator1 = values.clone();
>                   iterator2 = values.clone();
>                  while (iterator1.hasNext()) {
>                       val1 = iterator1.next();
>                       iterator2.reset();
>                       while (iterator2.hasNext()) {
>                            val2 = iterator.next();
>                            do something vased on val1 and val2
>                            .......................
>                       }
>                  }
> One possible optimization is that if the values are sorted based on a secondary key,

> the reset function can take a secondary key as an argument and reset the iterator to
the begining
> position of the secondary key. It will be very helpful if there is a utility that returns
a list of iterators,
> one per secondary key value, from the given iterator:
>                           TreeMap getIteratorsBasedOnSecondaryKey(iterator);
> Each entry in the returned map object is a pair of <secondary key, iterator for the
values with the same secondary key>.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message