hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mridul Muralidharan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-539) unable to control parallelism of Map tasks
Date Thu, 20 Nov 2008 19:52:44 GMT

    [ https://issues.apache.org/jira/browse/PIG-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649457#action_12649457
] 

Mridul Muralidharan commented on PIG-539:
-----------------------------------------


Hi Chris,

  We had a similar requirement - where the number of map tasks was high (because of a large
number of small files created as part of the pipeline prefix) and we wanted a small &
fixed number of map tasks (== number of mappers in cluster).

The only way I found to control the behavior in this case was something extremely heavyweight
- do :
LOAD, GROUP (with parallel), FOREACH/FLATTEN and rest of pipeline.

Apparently, there is no other way to do this in pig currently ...

> unable to control parallelism of Map tasks
> ------------------------------------------
>
>                 Key: PIG-539
>                 URL: https://issues.apache.org/jira/browse/PIG-539
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>         Environment: local execution + hadoop execution
>            Reporter: Christopher Olston
>
> I put "PARALLEL 1" following *every* statement in my pig script, and it still executes
maps with more than 1 parallel task. This is a major problem because for one of my operations
I need to have a serialized (non-parallel) map.
> Probably the semantics of parallelism should be as follows:
>  1. group pig operators into map/reduce stages
>  2. for each stage, take the minimum of the "Parallel" directives given by the user for
statements executed as part of that stage
> (We'll have to decide on a rule for statements that use the combiner, which execute partially
on the map side and partially on the reduce side ...)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message