hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christopher Olston (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-539) unable to control parallelism of Map tasks
Date Thu, 20 Nov 2008 18:06:44 GMT
unable to control parallelism of Map tasks

                 Key: PIG-539
                 URL: https://issues.apache.org/jira/browse/PIG-539
             Project: Pig
          Issue Type: Bug
          Components: impl
         Environment: local execution + hadoop execution
            Reporter: Christopher Olston

I put "PARALLEL 1" following *every* statement in my pig script, and it still executes maps
with more than 1 parallel task. This is a major problem because for one of my operations I
need to have a serialized (non-parallel) map.

Probably the semantics of parallelism should be as follows:
 1. group pig operators into map/reduce stages
 2. for each stage, take the minimum of the "Parallel" directives given by the user for statements
executed as part of that stage

(We'll have to decide on a rule for statements that use the combiner, which execute partially
on the map side and partially on the reduce side ...)

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message