hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christopher Olston (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-539) unable to control parallelism of Map tasks
Date Thu, 20 Nov 2008 18:06:44 GMT
unable to control parallelism of Map tasks
------------------------------------------

                 Key: PIG-539
                 URL: https://issues.apache.org/jira/browse/PIG-539
             Project: Pig
          Issue Type: Bug
          Components: impl
         Environment: local execution + hadoop execution
            Reporter: Christopher Olston


I put "PARALLEL 1" following *every* statement in my pig script, and it still executes maps
with more than 1 parallel task. This is a major problem because for one of my operations I
need to have a serialized (non-parallel) map.

Probably the semantics of parallelism should be as follows:
 1. group pig operators into map/reduce stages
 2. for each stage, take the minimum of the "Parallel" directives given by the user for statements
executed as part of that stage

(We'll have to decide on a rule for statements that use the combiner, which execute partially
on the map side and partially on the reduce side ...)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message