pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohini Palaniswamy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-3659) Memory management for each vertex
Date Wed, 08 Jan 2014 17:36:52 GMT

    [ https://issues.apache.org/jira/browse/PIG-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13865666#comment-13865666

Rohini Palaniswamy commented on PIG-3659:

  Current code just defaults to 1G for each vertex to get things to work. 

We need to 
   1) Classify whether a vertex is a map or reduce and set java.opts (mapreduce.map.java.opts
or mapreduce.reduce.java.opts), memory.mb (mapreduce.map.memory.mb or mapreduce.reduce.memory.mb)
and env (mapreduce.map.env or mapreduce.reduce.env) accordingly on the vertex. A simple thing
would be to assume all root vertexes to be map vertexes and intermediate or leaf vertexes
to be reduce vertexes.
   2) Even for a map vertex, if there are multiple outputs more memory is required as combine
and sort happens on each output. Similarly on a reduce vertex if there are multiple inputs
shuffle and sort happens on each  input thus requiring more memory than the traditional map
or reduce. i.e the sort buffers (io.sort.mb) and buffer for holding each record before serializing
or deserializing them take up memory. For eg: With 3 inputs or outputs, thrice the amount
of memory is tried to be allocated for the buffers leading to OOM. Increasing memory for a
vertex based on number of inputs or outputs might not solve the problem totally. This is something
we will have to talk to Tez guys to see how effectively this can be solved.

> Memory management for each vertex
> ---------------------------------
>                 Key: PIG-3659
>                 URL: https://issues.apache.org/jira/browse/PIG-3659
>             Project: Pig
>          Issue Type: Sub-task
>          Components: tez
>            Reporter: Rohini Palaniswamy
>             Fix For: tez-branch
> We need to configure appropriate memory options for each vertex.

This message was sent by Atlassian JIRA

View raw message