hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amar Kamat (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4766) Hadoop performance degrades significantly as more and more jobs complete
Date Fri, 09 Jan 2009 10:07:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662297#action_12662297
] 

Amar Kamat commented on HADOOP-4766:
------------------------------------

I had a discussion with Vivek on this and we both feel that cleaning off everything makes
more sense. Here is what all we can do
- Cleanup some _X%_ of jobs and then check if the memory is under control. With this we will
avoid immediate (memory) cleanups and also avoid frequent calls to _GC_. _X_ can be 25%.
- Sort the job on _num-tasks_ instead of _finish-time_ as the chances of freeing up memory
will be definitely.

We can do all these things as we have no contract regarding the web-ui display of completed
jobs. But the only drawback with these approaches is that we have to manually invoke _GC_
for that which might be problematic. There is no point in cleaning _X%_ if there is no way
to know if the memory usage is under control (doing _GC_). So for now I think it makes more
sense to cleanup everything and be sure that upon a single _gc_ the JobTracker will be safe.
We can later improve on that if needed. Thoughts?

> Hadoop performance degrades significantly as more and more jobs complete
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-4766
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4766
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.18.2, 0.19.0
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>            Priority: Blocker
>         Attachments: HADOOP-4766-v1.patch, HADOOP-4766-v2.4.patch, HADOOP-4766-v2.6.patch,
HADOOP-4766-v2.7-0.18.patch, HADOOP-4766-v2.7-0.19.patch, HADOOP-4766-v2.7.patch, HADOOP-4766-v2.8-0.18.patch,
HADOOP-4766-v2.8-0.19.patch, HADOOP-4766-v2.8.patch, map_scheduling_rate.txt
>
>
> When I ran the gridmix 2 benchmark load on a fresh cluster of 500 nodes with hadoop trunk,

> the gridmix load, consisting of 202 map/reduce jobs of various sizes, completed in 32
minutes. 
> Then I ran the same set of the jobs on the same cluster, yhey completed in 43 minutes.
> When I ran them the third times, it took (almost) forever --- the job tracker became
non-responsive.
> The job  tracker's heap size was set to 2GB. 
> The cluster is configured to keep up to 500 jobs in memory.
> The job tracker kept one cpu busy all the time. Look like it was due to GC.
> I believe the release 0.18/0.19 have the similar behavior.
> I believe 0.18 and 0.18 also have the similar behavior.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message