hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das" <d...@yahoo-inc.com>
Subject RE: using TreeMaps in JobTracker
Date Mon, 09 Jun 2008 14:21:08 GMT

> -----Original Message-----
> From: Brice Arnould [mailto:brice.arnould@gmail.com] 
> Sent: Monday, June 09, 2008 5:45 PM
> To: core-dev@hadoop.apache.org
> Subject: Re: using TreeMaps in JobTracker
> Devaraj Das a écrit :
>   >> It might also be a good idea to make getTasksToKill() return
> >> directly it's set "killJobIDs", instead of copying that set into a 
> >> List and return that list. Or to even not use a Set, if 
> TaskTrackers 
> >> drops silently commands of killings already dead tasks.
> > The tasktrackers wouldn't know that it has to kill something unless 
> > explicitly told about it (imagine that the user just fired 
> a command 
> > to kill a job, or the tasktracker is running a speculative task and 
> > another attempt of the same just finished). I am not sure I 
> understood you right though.
> Sorry. I'm going to try to tell it in a better way :
> The context is that JobTracker.getTasksToKill(taskTracker) go 
> through the list of Tasks that are associated with 
> taskTracker, create a set called killJobIds and fill it with 
> some of those tasks. Then it copy the content of killJobIds 
> in a list called killList and returns that List. 
> The content of killList list is then copied into another list 
> inside JobTracker.heartbeat().
> I suggest two changes :
> 1- Make JobTracker.getTasksToKill(taskTracker) return a 
> Collection, and make killJobIds that Collection, removing the 
> need for copying its content into the killList
> 2- Change the type of killJobIds from Set to ArrayList, since 
> anyway it cannot contain duplicate elements, because its 
> element are extracted from another set.

getTasksToKill returns two things in one list - the set of jobIDs the tasks
of which should be killed and the set of taskIDs that should be killed. Note
that the set of tasks is most likely just those for each of which another
attempt has completed successfully. A tasktracker might be running other
tasks of the same job and we don't want to kill those. The tasks for the
jobs in killJobIDs are not expanded. This list is then copied to the
"actions" list that contains everything the tasktracker should consider. So
maybe the last copy can be avoided by having the JT store a reference to
that action list and directly writing that out. But there are some other
things to be taken care of like using arrays instead of lists ....

> The two should reduce the number of allocation and the complexity.
> The digression about the taskTracker's behaviour was a 
> question about whether it is important or not for killJobIds 
> to not contain duplicates.
> >> By the way, my patch in the issue HADOOP-3412 also tries 
> improve the 
> >> way containers are used. It replaces jobsByPriority (which were 
> >> periodically resorted by resortPriority and in an 
> inefficient way) by 
> >> a TreeSet. It also replaces the TreeMap taskTrackers by a 
> >> ConcurrentHashMap.
> >> I don't know if it's feasible but allowing the JobTracker 
> to answer 
> >> to more than one HeartBeat at the same time (by using concurent 
> >> containers to lower it's granularity) could be a good idea. If you 
> >> think it's feasible I'll try to do it ^^
> > Answering more than one heartbeat at the same time is interesting. 
> > Could you pls elaborate on that. We sometime back were thinking of 
> > queuing up the heartbeats and processing them 
> asynchronously. Are you 
> > talking about the same?
> Yes. What I suggest is to make the "synchronized areas" 
> smaller using concurrent containers and then to use a 
> ThreadPool to answer heartbeats.
> If you think that it is possible, I'll try to do it.

This is definitely useful!

> Please forgive me for my english :-/ The next year I'll go to 
> study in Oregon, it should be better after that ^^

Don't worry so much :)

> Brice

View raw message