hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4766) Hadoop performance degrades significantly as more and more jobs complete
Date Tue, 20 Jan 2009 17:55:02 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665484#action_12665484
] 

Devaraj Das commented on HADOOP-4766:
-------------------------------------

The thing that worries me about the existing patch is that it is not at all predictable how
many jobs/tasks would be there in memory at any point. In my experiments with this patch and
a standalone program simulating the same behavior as what the patch is trying to do, I saw
that even after purging all the jobs, the memory usage as per Runtime.totalMemory - Runtime.freeMemory
didn't come down for quite a while and the thread was trying to free up memory needlessly
(note that things like whether incremental GC is in use would also influence this behavior).
The approach of basing things on keeping at most 'n' completed tasks in memory at least leads
to much more predictability. True that we don't know that the exact memory consumed by a TIP
but we can make a good estimate there and tweak the value of the max tasks in memory if need
be. Also, in the current patch, the configuration to do with the memory usage threshold is
equally dependent on estimation. I am not sure what the threshold should be - should it be
0.75 or 0.9 or 0.8..
Why do you say it is an overkill - i thought basing things on estimating total memory usage
is more trickier. Basing it on number of completed tasks seems very similar to the "number
of completed jobs" that we currently have. It's just that we are stepping one level below
and specifying a value for something the base size of which is going to always remain in control.
Also, completed jobs should be treated as one unit w.r.t removal. For example, if the value
configured for the max tasks is 1000, and we have a job with 1100 tasks, the entire job should
be removed (as opposed to removing only 1000 tasks of the job), keeping this whole thing really
simple.
Again, this is a short term fix until we move to the model of having a separate History server
process.

> Hadoop performance degrades significantly as more and more jobs complete
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-4766
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4766
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.18.2, 0.19.0
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>            Priority: Blocker
>         Attachments: HADOOP-4766-v1.patch, HADOOP-4766-v2.10.patch, HADOOP-4766-v2.4.patch,
HADOOP-4766-v2.6.patch, HADOOP-4766-v2.7-0.18.patch, HADOOP-4766-v2.7-0.19.patch, HADOOP-4766-v2.7.patch,
HADOOP-4766-v2.8-0.18.patch, HADOOP-4766-v2.8-0.19.patch, HADOOP-4766-v2.8.patch, map_scheduling_rate.txt
>
>
> When I ran the gridmix 2 benchmark load on a fresh cluster of 500 nodes with hadoop trunk,

> the gridmix load, consisting of 202 map/reduce jobs of various sizes, completed in 32
minutes. 
> Then I ran the same set of the jobs on the same cluster, yhey completed in 43 minutes.
> When I ran them the third times, it took (almost) forever --- the job tracker became
non-responsive.
> The job  tracker's heap size was set to 2GB. 
> The cluster is configured to keep up to 500 jobs in memory.
> The job tracker kept one cpu busy all the time. Look like it was due to GC.
> I believe the release 0.18/0.19 have the similar behavior.
> I believe 0.18 and 0.18 also have the similar behavior.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message