hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma (JIRA)" <j...@apache.org>
Subject [jira] Created: (MAPREDUCE-2048) reduce overhead of sorting jobs/pools in FairScheduler heartbeat processing
Date Wed, 01 Sep 2010 00:00:57 GMT
reduce overhead of sorting jobs/pools in FairScheduler heartbeat processing

                 Key: MAPREDUCE-2048
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2048
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: contrib/fair-share
            Reporter: Joydeep Sen Sarma

We are bound on the JT by the jobtracker lock. Sorting of jobs (and pools in hadoop-trunk)
done by the FairScheduler is done once per heartbeat while this lock is held. This shows up
as one of the places where we spend a lot of time holding the jobtracker lock.

We can avoid sorting the jobs/pools per heartbeat - and instead do a sort in the updateThread
(which is invoked periodically). The sorted set can be maintained incrementally (as jobs/pools
are scheduled in each heartbeat - one can delete/insert into the sortedset).

This may be less of an issue in trunk (as we sort pools and then sort jobs within a pool)
as opposed to hadoop-20 (where we sort all jobs). however - in our workload - we have lots
of pools (one per user) and lots of jobs in some pools (production pools) - so i think it's
reasonable to assume that this is worth addressing in trunk as well.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message