hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brice Arnould <brice.arno...@gmail.com>
Subject Re: using TreeMaps in JobTracker
Date Mon, 09 Jun 2008 12:14:48 GMT
Devaraj Das a écrit :
  >> It might also be a good idea to make getTasksToKill() return
>> directly it's set "killJobIDs", instead of copying that set 
>> into a List and return that list. Or to even not use a Set, 
>> if TaskTrackers drops silently commands of killings already 
>> dead tasks.
> The tasktrackers wouldn't know that it has to kill something unless
> explicitly told about it (imagine that the user just fired a command to kill
> a job, or the tasktracker is running a speculative task and another attempt
> of the same just finished). I am not sure I understood you right though.
Sorry. I'm going to try to tell it in a better way :

The context is that JobTracker.getTasksToKill(taskTracker) go through 
the list of Tasks that are associated with taskTracker, create a set 
called killJobIds and fill it with some of those tasks. Then it copy the 
content of killJobIds in a list called killList and returns that List. 
The content of killList list is then copied into another list inside 
JobTracker.heartbeat().

I suggest two changes :
1- Make JobTracker.getTasksToKill(taskTracker) return a Collection, and 
make killJobIds that Collection, removing the need for copying its 
content into the killList
2- Change the type of killJobIds from Set to ArrayList, since anyway it 
cannot contain duplicate elements, because its element are extracted 
from another set.

The two should reduce the number of allocation and the complexity.

The digression about the taskTracker's behaviour was a question about 
whether it is important or not for killJobIds to not contain duplicates.

>> By the way, my patch in the issue HADOOP-3412 also tries 
>> improve the way containers are used. It replaces 
>> jobsByPriority (which were periodically resorted by 
>> resortPriority and in an inefficient way) by a TreeSet. It 
>> also replaces the TreeMap taskTrackers by a ConcurrentHashMap.
>> I don't know if it's feasible but allowing the JobTracker to 
>> answer to more than one HeartBeat at the same time (by using 
>> concurent containers to lower it's granularity) could be a 
>> good idea. If you think it's feasible I'll try to do it ^^
> Answering more than one heartbeat at the same time is interesting. Could you
> pls elaborate on that. We sometime back were thinking of queuing up the
> heartbeats and processing them asynchronously. Are you talking about the
> same?
Yes. What I suggest is to make the "synchronized areas" smaller using 
concurrent containers and then to use a ThreadPool to answer heartbeats.
If you think that it is possible, I'll try to do it.

Please forgive me for my english :-/ The next year I'll go to study in 
Oregon, it should be better after that ^^

Brice

Mime
View raw message