hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amar Kamat (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1533) reduce or remove usage of String.format() usage in CapacityTaskScheduler.updateQSIObjects
Date Tue, 06 Apr 2010 07:31:33 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853785#action_12853785
] 

Amar Kamat commented on MAPREDUCE-1533:
---------------------------------------

How about using _StringBuilder_ instead of _String.format_? The problem lies in the way how
scheduling info is managed. As of now its a push model where every change in the scheduler's
state results into an info string which gets  pushed to all the jobs. Shouldn't it be a pull
model wherein the jobs pull the data from the scheduler whenever required? Roughly ~100 hearbeat
calls are made in a sec and in every hearbeat, the scheduler's state can potentially change
resulting into an info string being pushed. That is, most of the times the info gets over-written
before getting consumed making the pull model a good fit for this case. But for now we can
keep it simple and solve the problem at hand by using StringBuilder. Thoughts?

> reduce or remove usage of String.format() usage in CapacityTaskScheduler.updateQSIObjects
> -----------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1533
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1533
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.20.1
>            Reporter: Rajesh Balamohan
>            Assignee: Amar Kamat
>         Attachments: mapreduce-1533-v1.4.patch
>
>
> When short jobs are executed in hadoop with OutOfBandHeardBeat=true, JT executes heartBeat()
method heavily. This internally makes a call to CapacityTaskScheduler.updateQSIObjects().

> CapacityTaskScheduler.updateQSIObjects(), internally calls String.format() for setting
the job scheduling information. Based on the datastructure size of "jobQueuesManager" and
"queueInfoMap", the number of times String.format() gets executed becomes very high. String.format()
internally does pattern matching which turns to be out very heavy (This was revealed while
profiling JT. Almost 57% of time was spent in CapacityScheduler.assignTasks(), out of which
String.format() took 46%.
> Would it be possible to do String.format() only at the time of invoking JobInProgress.getSchedulingInfo?.
This might reduce the pressure on JT while processing heartbeats. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message