hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hong Tang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-841) Protect Job Tracker against memory exhaustion due to very large InputSplit or JobConf objects
Date Sat, 08 Aug 2009 08:23:14 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740853#action_12740853
] 

Hong Tang commented on MAPREDUCE-841:
-------------------------------------

For input split objects, we could write the serialized buffer of user input split objects
to a separate file instead of within the RawSplit object. Namely, change RawSplit from:
{code}
  static class RawSplit implements Writable {
    private String splitClass;
    private BytesWritable bytes = new BytesWritable();
    private String[] locations;
    long dataLength;
    ...
{code}

To:
{code}
  static class RawSplit implements Writable {
    private String splitClass;
    private BytesWritable bytes = null;
    private long offset; // pointing to the offset to the serialized bytes for the user input
split
    private long length; // the length of the serialized bytes for the user input split
    private String[] locations;
    long dataLength;
    ...
{code}

Where the field "bytes" will be loaded from the external file before we send the object to
the TT, and we shall reset the reference to null after that.

> Protect Job Tracker against memory exhaustion due to very large InputSplit or JobConf
objects
> ---------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-841
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-841
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 0.20.1
>            Reporter: Hong Tang
>             Fix For: 0.21.0
>
>
> JobTracker only needs to examine a subset of information contained by InputSplit or JobConf
objects. But currently JobTracker loads the complete user-defined InputSplit and JobConf objects
in memory. This design would leave JobTracker susceptible to memory exhaustion particularly
in cases when some bugs in user code which could result in very large input splits or job
conf objects (e.g. PIG-901).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message