hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brice Arnould <brice.arno...@gmail.com>
Subject Re: using TreeMaps in JobTracker
Date Mon, 09 Jun 2008 09:46:20 GMT
Vivek Ratan a écrit :
> The JT has a number of Map member variables, and I noticed that it uses
> TreeMaps for most, if not all, of them. I also noticed that these member
> variables are pretty much used for 'puts' and 'gets'. Given that there
> is no need for sorted iteration, and the JT doesn't even iterate over
> any of these maps, shouldn't it be better to use HashMaps? 
We might also want to turn the taskidToTrackerMap into a HashMap<String, 
Vector<TaskAttemptID>>. Given that the maximum number of TaskAttemptID 
per TaskTracker is very low, it may allow us to be faster and to use 
less memory (even if the asymptotic complexity would be greater).

It might also be a good idea to make getTasksToKill() return directly 
it's set "killJobIDs", instead of copying that set into a List and 
return that list. Or to even not use a Set, if TaskTrackers drops 
silently commands of killings already dead tasks.

By the way, my patch in the issue HADOOP-3412 also tries improve the way 
containers are used. It replaces jobsByPriority (which were periodically 
resorted by resortPriority and in an inefficient way) by a TreeSet. It 
also replaces the TreeMap taskTrackers by a ConcurrentHashMap.
I don't know if it's feasible but allowing the JobTracker to answer to 
more than one HeartBeat at the same time (by using concurent containers 
to lower it's granularity) could be a good idea. If you think it's 
feasible I'll try to do it ^^


PS: Usual warnings about my use of English applies here :-P

View raw message