hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mridul Muralidharan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-539) unable to control parallelism of Map tasks
Date Thu, 20 Nov 2008 19:52:44 GMT

    [ https://issues.apache.org/jira/browse/PIG-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649457#action_12649457

Mridul Muralidharan commented on PIG-539:

Hi Chris,

  We had a similar requirement - where the number of map tasks was high (because of a large
number of small files created as part of the pipeline prefix) and we wanted a small &
fixed number of map tasks (== number of mappers in cluster).

The only way I found to control the behavior in this case was something extremely heavyweight
- do :
LOAD, GROUP (with parallel), FOREACH/FLATTEN and rest of pipeline.

Apparently, there is no other way to do this in pig currently ...

> unable to control parallelism of Map tasks
> ------------------------------------------
>                 Key: PIG-539
>                 URL: https://issues.apache.org/jira/browse/PIG-539
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>         Environment: local execution + hadoop execution
>            Reporter: Christopher Olston
> I put "PARALLEL 1" following *every* statement in my pig script, and it still executes
maps with more than 1 parallel task. This is a major problem because for one of my operations
I need to have a serialized (non-parallel) map.
> Probably the semantics of parallelism should be as follows:
>  1. group pig operators into map/reduce stages
>  2. for each stage, take the minimum of the "Parallel" directives given by the user for
statements executed as part of that stage
> (We'll have to decide on a rule for statements that use the combiner, which execute partially
on the map side and partially on the reduce side ...)

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message