river-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patricia Shanahan <p...@acm.org>
Subject Re: A new implementation of TaskManager
Date Fri, 02 Jul 2010 11:25:59 GMT
On 6/30/2010 5:58 AM, Patricia Shanahan wrote:
> Peter Firmstone wrote:
>> I started thinking about a replacement, it's in
>> org.apache.river.imp.util.
> Excellent. I'll take a look, and pick up code and ideas as appropriate.
>> The name spaces are as follows:
>> net.jini.* API - must be backward compatible, or breakage minimal for
>> if there's a good reason.
>> com.sun.jini.* - implementation, subject to change.
>> com.artima.* - implementation, subject to change.
>> org.apache.river.api.* - new API under review, please feel free to
>> look for ways to improve before its frozen in November / December.
>> org.apache.river.imp.* - new implementation, subject to change,
>> eventually the com.* namespace will move there too.
>> Best bet would be to search for the classes that depend on the Task
>> interface, see how they use it and come up with something better.
> I've looked at several of them. Most runAfter implementations
> unconditionally return false. I'm tracking down the ones that don't.
>> Agreed, the Task interface is less than optimum, best to replace it.
>> TaskManager has been identified as a performance bottleneck.
> Are there any benchmarks demonstrating the bottleneck? I'll need some to
> see if my changes make matters significantly better. My guess from
> reading TaskManager code is that it is fine for small numbers of tasks,
> but does not scale well. However, measurements always beat guesses.
> My current idea is to use a long sequence number assigned when a Task is
> added to the TaskManager. That would work for a TaskManager that
> operates continuously for over 290 years given one new Task every
> nanosecond. If more is needed, I could go to BigInteger depending on
> what the benchmarks tell me.
> This requires wrapping the caller's Task object in another data
> structure, which has other uses.
> Given a sequence number, I could separate the tasks into separate data
> structures depending status. Each data structure would be designed for
> the operations TaskManager needs to do on tasks in that state. In
> particular, tasks that are waiting to be assigned a thread would live in
> a BlockingPriorityQueue whose Comparator goes by sequence number.
> I am considering changing the runAfter method to take a single Task,
> rather than a List. It moves a method call from outside to inside a
> loop, but I think it is worth it for the additional information it would
> give TaskManager. It simplifies the interface, which is usually a good
> thing to do for maintainability. It decouples the TaskManager callers
> from how TaskManager organizes its data structures, important not just
> for this round, but for any future tuning of TaskManager.

The same benefits could result, without actually scanning tasks if this 
Task has no dependencies at all, by replacing runAfter with:

/** Get the first Task on which this one depends.
@return The first Task in candidate's Iteration order that this
Task must wait for, or null if there is no such Task.
@param candidates An Iterable containing older Task's on which this one
might depend.
Task firstDependency(Iterable<Task> candidates);

A Task implementation that currently has runAfter unconditionally return 
false would use:

Task firstDependency(Iterable<Task> candidates){
  return null;

A non-trivial runAfter would become:

Task firstDependency(Iterable<Task> candidates){
   for(Task candidate: candidates){
     if(this must run after candidate){
       return candidate;
   return null;

This gets the benefit of efficient handling of Task implementations that 
never need to run after another Task, but gives just as much flexibility 
in TaskManager. An Iterable can be a slice of a SortedSet or List.

> TaskManager, given a new Task, would begin by identifying the highest
> sequence number existing task on which it depends, using a backwards
> scan. If the Task depends on nothing, it goes in the priority queue for
> dispatch to a thread. If Task Y is the youngest incomplete task on which
> Task X depends, add X to a Set associated with Y, and put X in a pending
> data structure.
> (To reduce object creation and destruction in simple cases, Y's Set
> would be created the first time an object is added to it.)
> When Task Y leaves the TaskManager, through removal or completion,
> reexamine X, looking to see if it depends on an incomplete Task with a
> lower sequence number than Y. If it does, put it in that Task's waiting
> Set and leave it pending. If not, put it in the priority queue.
> In this design, each question of the form "Does X depend on Y" would be
> asked at most once. A Task that is waiting on runAfter is kept out of
> the dispatch data structure.
> This is just a first thought, and will almost certainly change after I
> have read more and done some experiments.
> Patricia

View raw message