hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amar Kamat (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1533) reduce or remove usage of String.format() usage in CapacityTaskScheduler.updateQSIObjects
Date Tue, 06 Apr 2010 08:29:34 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853804#action_12853804

Amar Kamat commented on MAPREDUCE-1533:

Benchmark results comparing StringBuilder with String.format :
1) StringBuilder took 1.261 secs for generating 1,000,000 strings 
2) String.format took 9.126 sec for generating 1,000,000 strings

So assuming that there are 400 heartbeat calls made per sec, we have ~2.5 ms per heartbeat
time. Assuming that there are not more than 100 jobs running at a given time, we have 
1) StringBuilder taking 0.1261 ms for generating 100 strings 
2) String.format taking 0.9126 ms for generating 100 strings

Thus String.format takes 36% (i.e 0.9126/2.5) whereas StringBuilder takes 5% (i.e 0.1261/2.5)
of the total heartbeat processing time. 

> reduce or remove usage of String.format() usage in CapacityTaskScheduler.updateQSIObjects
> -----------------------------------------------------------------------------------------
>                 Key: MAPREDUCE-1533
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1533
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.20.1
>            Reporter: Rajesh Balamohan
>            Assignee: Amar Kamat
>         Attachments: mapreduce-1533-v1.4.patch
> When short jobs are executed in hadoop with OutOfBandHeardBeat=true, JT executes heartBeat()
method heavily. This internally makes a call to CapacityTaskScheduler.updateQSIObjects().

> CapacityTaskScheduler.updateQSIObjects(), internally calls String.format() for setting
the job scheduling information. Based on the datastructure size of "jobQueuesManager" and
"queueInfoMap", the number of times String.format() gets executed becomes very high. String.format()
internally does pattern matching which turns to be out very heavy (This was revealed while
profiling JT. Almost 57% of time was spent in CapacityScheduler.assignTasks(), out of which
String.format() took 46%.
> Would it be possible to do String.format() only at the time of invoking JobInProgress.getSchedulingInfo?.
This might reduce the pressure on JT while processing heartbeats. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message