pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohini Palaniswamy (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-4775) Better default values for shuffle bytes per reducer
Date Mon, 11 Jan 2016 02:44:39 GMT

     [ https://issues.apache.org/jira/browse/PIG-4775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Rohini Palaniswamy updated PIG-4775:
------------------------------------
    Attachment: PIG-4775-2.patch

Added the constants. intermediateTaskInputSize is not a constant, but determined in the constructor
based on filesystem default block size.

> Better default values for shuffle bytes per reducer
> ---------------------------------------------------
>
>                 Key: PIG-4775
>                 URL: https://issues.apache.org/jira/browse/PIG-4775
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.16.0
>
>         Attachments: PIG-4775-1.patch, PIG-4775-2.patch
>
>
> Currently the code does not set TEZ_SHUFFLE_VERTEX_MANAGER_DESIRED_TASK_INPUT_SIZE if
BYTES_PER_REDUCER_PARAM is not set or equal to DEFAULT_BYTES_PER_REDUCER (1G). Which makes
it default to TEZ_SHUFFLE_VERTEX_MANAGER_DESIRED_TASK_INPUT_SIZE_DEFAULT = 1024*1024*100L
(100MB) which is low and can cause to produce more output files than usual. Removing that
check and defaulting to 1G would be bad for performance as in case of mapreduce that was based
as map input size, but in Tez it is taken as map output size. So setting 384MB as default
for group by as they usually reduce size of data output and keeping 256MB for joins as they
increase size of output data.
> Did not touch order by and skewed join as DEFAULT_BYTES_PER_REDUCER of 1G is honored
there. Using 1G for them would be similar to mapreduce, as map input and output would be same
for those cases. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message