hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Dai (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-895) Default parallel for Pig
Date Wed, 22 Jul 2009 00:29:15 GMT
Default parallel for Pig

                 Key: PIG-895
                 URL: https://issues.apache.org/jira/browse/PIG-895
             Project: Pig
          Issue Type: New Feature
          Components: impl
    Affects Versions: 0.3.0
            Reporter: Daniel Dai
             Fix For: 0.4.0

For hadoop 20, if user don't specify the number of reducers, hadoop will use 1 reducer as
the default value. It is different from previous of hadoop, in which default reducer number
is usually good. 1 reducer is not what user want for sure. Although user can use "parallel"
keyword to specify number of reducers for each statement, it is wordy. We need a convenient
way for users to express a desired number of reducers. Here is my propose:

1. Add one property "default_parallel" to Pig. User can set default_parallel in script. Eg:
   set default_parallel '10';

2. default_parallel is a hint to Pig. Pig is free to optimize the number of reducers (unlike
parallel keyword). Currently, since we do not have a mechanism to determine the optimal number
of reducers, default_parallel will be always granted, unless it is override by "parallel"

3. If user put multiple default_parallel inside script, the last entry will be taken.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message