hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-4018) limit memory usage in jobtracker
Date Wed, 03 Sep 2008 01:05:44 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

dhruba borthakur updated HADOOP-4018:
-------------------------------------

    Attachment: maxSplits4.patch

Hi Amar, thanks for your comments.

>1. If the job fails on init(), JobTracker invokes JobInProgress.kill(). So ideally you
should simply throw an exception if the limit is crossed

Can you pl explain which potion of code you are referring to here?

>2. The api totalNumTasks() is not used anywhere and can be removed.
This API is used by JobInProgress.initTasks. This method computes the number of tasks that
is needed by this job.

Regarding 3 and 4 i agree with you that it is better if I can check these limits in the constructor
of JobInProgress. But, the number of splits for this current jobis not yet available when
the constructor is invoked. That's the reason I do these checks in initTasks. Does it make
sense?

regarding point 5, my latest patch has this fix.


> limit memory usage in jobtracker
> --------------------------------
>
>                 Key: HADOOP-4018
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4018
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: maxSplits.patch, maxSplits2.patch, maxSplits3.patch, maxSplits4.patch
>
>
> We have seen instances when a user submitted a job with many thousands of mappers. The
JobTracker was running with 3GB heap, but it was still not enough to prevent memory trashing
from Garbage collection; effectively the Job Tracker was not able to serve jobs and had to
be restarted.
> One simple proposal would be to limit the maximum number of tasks per job. This can be
a configurable parameter. Is there other things that eat huge globs of memory in job Tracker?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message