hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-951) Reset parallelism to 1 for indexing job in MergeJoin
Date Thu, 10 Sep 2009 03:43:58 GMT
Reset parallelism to 1 for indexing job in MergeJoin
----------------------------------------------------

                 Key: PIG-951
                 URL: https://issues.apache.org/jira/browse/PIG-951
             Project: Pig
          Issue Type: Bug
          Components: impl
            Reporter: Ashutosh Chauhan
            Assignee: Ashutosh Chauhan


After sampling one tuple from every block, one reducer is used to sort the index entries in
reduce phase to produce sorted index to be used in actual join job. Thus, parallelism of index
job should be explictly set to 1. Currently, its not.

Currently, this is a non-issue, since we don't allow any blocking operators in pipeline before
merge-join. However, later when we do allow blocking operators, then parallelism of indexing
job will be that of preceding blocking operator. Even then, job will complete successfully
because all tuple will go to only one reducer, because we are grouping on only one key "all".
However, it will waste cluster resources by starting all the extra reducers which get no data
and thus do nothing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message