hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1750) Make #rows avail. to reducers as environment variable
Date Mon, 03 May 2010 05:11:56 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12863234#action_12863234
] 

Owen O'Malley commented on MAPREDUCE-1750:
------------------------------------------

Is this for streaming? I don't remember how lazy streaming is about waiting for the input
before starting the process. If the process starts too early, it will be a difficult change.
In any case, it will be easier to start by making it available to Java first.

I assume you mean the number of values for this reduce? The number of keys isn't known until
the reduce is almost done. You also can't know the number of keys or values for other reduces
without a lot of extra traffic from the JobTracker.

> Make #rows avail. to reducers as environment variable
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-1750
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1750
>             Project: Hadoop Map/Reduce
>          Issue Type: Wish
>            Reporter: Adam Kramer
>            Priority: Minor
>
> Given that there is a sort phase between the copy phase and the reduce phase, it seems
like there is a chance for counting during sort.
> It would be nice if my reducers could have access to an environment variable, say, mapred.reduce.rows,
that contained the number of rows present for this reducer (as counted during the sort step).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message