hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Khaled Elmeleegy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5632) Jobtracker leaves tasktrackers underutilized
Date Sun, 26 Apr 2009 22:52:30 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702952#action_12702952
] 

Khaled Elmeleegy commented on HADOOP-5632:
------------------------------------------

I think we should mainly care about the average load. If load is bursty,
bursts of requests will be buffered in memory. JT will go about the requests
one after the other. If the average load is less than the JT capacity, the
JT will be able to catch up.

Side effects of bursty load is some memory used for buffering RPC requests.
Also, the average RPC response time can go higher, this can lead to lower
slot utilization. However, I don't see any obvious reason why we'd have huge
bursts in heartbeats, as there is enough randomness in the system that
should prevent heartbeats from all TTs to arrive at the same time to the JT.
So, I don't think this would be a problem in practice, but lets try it and
see.


About the getTaskCompletionEvents, it seems to be light weight, yet many of
these getTaskCompletionEvents can also overwhelm the JT as you pointed out.
So, it too must be included in the average load formula for now.

This brings me to another point. It seems to me that the TTs polling the JT
for task completion events is not the most efficient of ways for doing this.
It's not scalable as you observed. I think using asynchronous completion
notification events from the JT to the TT could be far more efficient, but
that's another debate.

Finally, I completely agree that all this should be thoroughly tested and
stress tested under different workloads and cluster sizes before committing
it. As I said before, I tested my patch in the environment I had and it
seemed to work fine. However, we should definitely do more on that front.



> Jobtracker leaves tasktrackers underutilized
> --------------------------------------------
>
>                 Key: HADOOP-5632
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5632
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.18.0, 0.18.1, 0.18.2, 0.18.3, 0.19.0, 0.19.1, 0.20.0
>         Environment: 2x HT 2.8GHz Intel Xeon, 3GB RAM, 4x 250GB HD linux boxes, 100 node
cluster
>            Reporter: Khaled Elmeleegy
>         Attachments: hadoop-khaled-tasktracker.10s.uncompress.timeline.pdf, hadoop-khaled-tasktracker.150ms.uncompress.timeline.pdf,
jobtracker.patch, jobtracker20.patch
>
>
> For some workloads, the jobtracker doesn't keep all the slots utilized even under heavy
load.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message