hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amr Awadallah (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-856) PERFORMANCE: reduce number of replicas
Date Wed, 24 Jun 2009 03:09:07 GMT

    [ https://issues.apache.org/jira/browse/PIG-856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723412#action_12723412

Amr Awadallah commented on PIG-856:

Please keep in mind that when running on a loaded system (i.e. with many concurrent jobs)
the fair-scheduler will have a better chance of allocating mappers with local data to process
your job if you have more replicas (not sure if capacity also does that). So, while setting
replicas to less than 3 might improve performance when you are only job running in system,
it will harm it when you are sharing cluster with many others.

Not to mention that this also affects speculative execution, etc.

-- amr

> PERFORMANCE: reduce number of replicas
> --------------------------------------
>                 Key: PIG-856
>                 URL: https://issues.apache.org/jira/browse/PIG-856
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.3.0
>            Reporter: Olga Natkovich
> Currently Pig uses the default number of replicas between MR jobs. Currently, the number
is 3. Given the temp nature of the data, we should never need more than 2 and should explicitely
set it to improve performance and to be nicer to the name node.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message