river-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patricia Shanahan <p...@acm.org>
Subject Re: TaskManager progress
Date Mon, 26 Jul 2010 22:43:23 GMT
Gregg Wonderly wrote:
> Patricia Shanahan wrote:
>> On 7/21/2010 12:58 PM, Gregg Wonderly wrote:
>> ...
>>> When I write code of this nature, attempting to remove all contention, I
>>> try
>>> to list every "step" that changes the "view" of the world, and think 
>>> about
>>> how that "view" can be made atomic by using explicit ordering of 
>>> statements
>>> rather than synchronized{} blocks.  ...
>>
>> I would like to discuss how to approach performance improvement, and 
>> especially scaling improvement. We seem to have different 
>> philosophies, and I'm interested in understanding other people's 
>> approaches to programming.
>>
>> I try to first find the really big wins, which are almost always data 
>> structure and algorithm changes. That should result in code that is 
>> efficient in terms of total CPU time and memory. During that part of 
>> the process, I prefer to keep the concurrency design as simple as 
>> possible, which in Java often means using synchronization at a coarse 
>> level, such as synchronization on a TaskManager instance.
> 
> I don't think we are too far different.  The big wins are the ones to go 
> for. What I've learned over the years, debugging Java performance issues 
> in Jini applications, and elsewhere, is that "synchronized", while the 
> most "correct" choice in many cases, is also the "slowest" form of 
> concurrency control.

Yes, I think it is merely a matter of emphasis. One problem I've seen is 
a tendency to accept the times without questioning whether they need to 
be as long as they are, and focus too much too soon on concurrency, 
before making it fast, so I tend to resist that by keeping the 
concurrency simple until the data structures and algorithms are right.

> 
> In a "service" application in particular, it can, in many cases, be the 
> case that the total CPU time needed to perform the actual work, is a 
> smaller fraction of the time that "synchronized" injects as latency in 
> the execution path.  I think you understand this issue, but I want to 
> illustrate it to make sure.
> 
> File I/O and network I/O latency can create similar circumstances.  If 
> you look at this with some numbers (I put ms but any magnitude will show 
> the same behavior) such as the following:
> 
> 2ms to enter server (Security/Permission stuff)
> 1ms CPU to arrive at synchronized
> 3ms through synchronized lock
> 1ms to return result
> 
> Then if there are 30 such threads running through the server, 
> eventually, all of them will be standing in line at the synchronized 
> (monitor entry) because they can get through all the other stuff in 
> short order.  As the total number of threads increases, the 
> "synchronized" section time must be multiplied by the number of threads, 
> and so with 10 threads, it becomes 30ms of time, because each thread 
> must wait it's turn.  Thus, 30ms becomes the minimum latency
> through that part of the system, instead of the 3ms there would be with 
> 1 thread.

Suppose a better choice of algorithm and data structures could reduce 
the time with the synchronized lock from 3ms to 3us. Wouldn't it be 
better to do that first? With 10 threads, eliminating synchronization 
may, at best, produce a 10x performance improvement. Picking the right 
data structures and algorithms can often do better than that.

An ArrayList is not an efficient representation of a queue, especially 
under load conditions that may cause a significant backlog.

I feel that doing some optimizations, including a lot of the concurrency 
changes, can create complication and commitment that becomes a barrier 
to algorithm and data structure changes. On the other hand, picking good 
data structures and algorithms does nothing to bar improving concurrency 
later.

> So, eliminating synchronized as a "global" encounter is what I always 
> consider first.

I, on the other hand, prefer to minimize the work being done first, and 
then decide whether there is enough left for synchronization to be a 
potential bottleneck.

Note that there is a finite limit to the number of hardware threads that 
can be supported efficiently, at least with current technology, while 
maintaining the Java memory model. Really large clusters run as multiple 
separate SMP computers, each with one or more JVMs of its own, with 
message passing rather than shared memory communication between the SMP 
computers.

Patricia


Mime
View raw message