hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amar Kamat (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4766) Hadoop performance degrades significantly as more and more jobs complete
Date Fri, 05 Dec 2008 05:28:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653634#action_12653634
] 

Amar Kamat commented on HADOOP-4766:
------------------------------------

{quote}
If the number of tasks kept in memory is critical for the performance of JobTracker (and thus
to the whole cluster), then we should set limit on that, instead of the
number of jobs, because the numbers of tasks of jobs can vary a lot.
{quote}
Yeah. Seems like the performance is dependent on the number of tasks kept in memory. One way
to test this would be to start a job on a jobtracker (~2gb heap) with 400K tasks (no-op map
tasks). Initially you will see the jobtracker working fine but later it will slow down and
will start losing heartbeat. I tried this on 0.17. The best I could get was roughly 200K-250K
tasks after which I started seeing this slowdown. Never tried with back to back jobs though.

{quote}
Also, we need to understand how the number of tasks kept in memory impacts the performance.
{quote}
There was an effort made by Dhruba (HADOOP-4018) to cap {{JobTracker's}} memory but ultimately
we ended up doing something else. Also HADOOP-2573 should help solve this problem. 


> Hadoop performance degrades significantly as more and more jobs complete
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-4766
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4766
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.18.2, 0.19.0
>            Reporter: Runping Qi
>            Priority: Blocker
>             Fix For: 0.18.3, 0.19.1, 0.20.0
>
>         Attachments: map_scheduling_rate.txt
>
>
> When I ran the gridmix 2 benchmark load on a fresh cluster of 500 nodes with hadoop trunk,

> the gridmix load, consisting of 202 map/reduce jobs of various sizes, completed in 32
minutes. 
> Then I ran the same set of the jobs on the same cluster, yhey completed in 43 minutes.
> When I ran them the third times, it took (almost) forever --- the job tracker became
non-responsive.
> The job  tracker's heap size was set to 2GB. 
> The cluster is configured to keep up to 500 jobs in memory.
> The job tracker kept one cpu busy all the time. Look like it was due to GC.
> I believe the release 0.18/0.19 have the similar behavior.
> I believe 0.18 and 0.18 also have the similar behavior.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message