tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TEZ-338) Determine reduce task parallelism
Date Mon, 19 Aug 2013 22:57:48 GMT

    [ https://issues.apache.org/jira/browse/TEZ-338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13744406#comment-13744406

Bikas Saha commented on TEZ-338:

I guessed tez engine, being hadoop, will be configured via configuration. We can change it
once we clean up tez engine.

yes. its for efficiency purposes so that we dont keep creating copies of all event objects
for every downstream task. I was even thinking of doing it just once instead of just once
per getTaskCompletions() but that might break with retries.
Didnt quite get the bug. The first event in every getTaskCompletions() will have the payload
and so shuffle.setReduceRange() will be called once on every getTaskCompletions(). The code
in Shuffle.setReduceRange() already handles multiple sets with the same value. So its taken
care of. Later, if we allow changing the range on the fly then we will need to removed that
check (among other things).
I am going to remove Shuffle.setReduceRange() and pass the payload to Shuffle directly so
that it can do what it wants. Cleaner IMO.

be configurable. Current workaround is @private

MRHelpers is not visible in tez-engine. tez-mapreduce depends on tez-engine.

If you think TEZ-375 is invalid then I will remove the comment. Please close the bug.

> Determine reduce task parallelism
> ---------------------------------
>                 Key: TEZ-338
>                 URL: https://issues.apache.org/jira/browse/TEZ-338
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>              Labels: TEZ-0.2.0
>         Attachments: TEZ-338.1.patch, TEZ-338.2.patch, TEZ-338.3.patch, TEZ-338.4.patch,
> Determine the parallelism of reduce tasks at runtime. This is important because its difficult
to determine this accurately before the job actually runs due to unknown data reduction ratios
in the intermediate stages.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message