hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das" <d...@yahoo-inc.com>
Subject RE: using TreeMaps in JobTracker
Date Mon, 09 Jun 2008 11:00:12 GMT

> -----Original Message-----
> From: Brice Arnould [mailto:brice.arnould@gmail.com] 
> Sent: Monday, June 09, 2008 3:16 PM
> To: core-dev@hadoop.apache.org
> Subject: Re: using TreeMaps in JobTracker
> Vivek Ratan a écrit :
> > The JT has a number of Map member variables, and I noticed that it 
> > uses TreeMaps for most, if not all, of them. I also noticed 
> that these 
> > member variables are pretty much used for 'puts' and 'gets'. Given 
> > that there is no need for sorted iteration, and the JT doesn't even 
> > iterate over any of these maps, shouldn't it be better to 
> use HashMaps?
> We might also want to turn the taskidToTrackerMap into a 
> HashMap<String, Vector<TaskAttemptID>>. Given that the 
> maximum number of TaskAttemptID per TaskTracker is very low, 
> it may allow us to be faster and to use less memory (even if 
> the asymptotic complexity would be greater).

This is fair I think.

> It might also be a good idea to make getTasksToKill() return 
> directly it's set "killJobIDs", instead of copying that set 
> into a List and return that list. Or to even not use a Set, 
> if TaskTrackers drops silently commands of killings already 
> dead tasks.

The tasktrackers wouldn't know that it has to kill something unless
explicitly told about it (imagine that the user just fired a command to kill
a job, or the tasktracker is running a speculative task and another attempt
of the same just finished). I am not sure I understood you right though.

> By the way, my patch in the issue HADOOP-3412 also tries 
> improve the way containers are used. It replaces 
> jobsByPriority (which were periodically resorted by 
> resortPriority and in an inefficient way) by a TreeSet. It 
> also replaces the TreeMap taskTrackers by a ConcurrentHashMap.
> I don't know if it's feasible but allowing the JobTracker to 
> answer to more than one HeartBeat at the same time (by using 
> concurent containers to lower it's granularity) could be a 
> good idea. If you think it's feasible I'll try to do it ^^

Answering more than one heartbeat at the same time is interesting. Could you
pls elaborate on that. We sometime back were thinking of queuing up the
heartbeats and processing them asynchronously. Are you talking about the

> Brice
> PS: Usual warnings about my use of English applies here :-P

View raw message