river-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike McGrady <mmcgr...@topiatechnology.com>
Subject Re: Possible multi-threading bug
Date Tue, 30 Nov 2010 12:21:54 GMT
I just want to add that as a "real world" integrator with the highest performance expectations,
I rely more on the architecture and design of systems to solve scalability problems.  A poor
design can be made to scale but there are two scaling ceilings: not only performance but also
cost.

Sent from my iPhone

Michael McGrady
Principal investigator AF081_028 SBIR
Chief Architect
Topia Technology, Inc
Work 1.253.572.9712
Cel 1.253.720.3365

On Nov 30, 2010, at 3:59 AM, Patricia Shanahan <pats@acm.org> wrote:

> On 11/30/2010 1:43 AM, Peter Firmstone wrote:
>> Patricia Shanahan wrote:
>>> Tom Hobbs wrote:
>>>> Yes, you're right.
>>>> 
>>>> I knew about the non-atomicity of ++, my concern was a call to reset
>>>> creeping in between the two parts of that operation.
>>> 
>>> That is a very good point.
>>> 
>>> Leaving reset unsynchronized, even with volatile, would lead to
>>> results that would not be possible with full synchronization. Suppose
>>> thread A is going to do a reset, and thread B is going to do an
>>> increment, and everybody agrees the count is currently 10.
> ....
>> Ah yes, correct, my mistake, easy to stuff up isn't it? ;)
>> 
>> In essence I agree, unfortunately we don't know when we need the
>> performance, because we can't test scalability. I've only got 4 threads!
> 
> I've only got 8 threads on my largest system.
> 
> I am very, very strongly opposed to attempting performance tuning without measurement.
I've seen it tried many times over several decades, and it is always a disaster. I've compared
e.g. estimates of where bottlenecks will be in an operating system prepared by the OS developers
to actual measurements, and I've yet to see the developers get it right. That includes my
own efforts at guessing bottlenecks, before I learned that it is futile.
> 
> Without measurement, you get either get focused effort in entirely the wrong places or
diffuse effort spread over the whole system. What is really needed is focused effort on a
few real bottlenecks that will go way beyond any set of rules that could reasonably be applied
throughout the system.
> 
> I believe the solution is to establish a working relationship with users of River who
have larger systems. They have a vested interest in helping us make it more scalable.
> 
>> Here are some simple tips I've found useful:
> 
> Have you measured the effectiveness of these tips on real world scalability? What sort
of gain do you see?
> 
> There may also be opportunities in the area of data structure and algorithm scalability.
For example, TaskManager uses an ArrayList to represent something that is basically a FIFO
with some reordering. That is a good idea if, and only if, the queue lengths are always very
short even on large systems. However, I have no idea whether TaskManager queue lengths tend
to increase on large systems or not.
> 
>>  1. If all mutators are atomic and don't depend on previous state, a
>>     volatile reference or field may be sufficient, but now we have
>>     concurrency utilities, why not use an atomic reference or field
>>     instead? Then if we find we later need a method based on previous
>>     state, it's easily proven correct.
> 
> Do we have the concurrency utilities? The java.util.concurrent packages are all "Since
1.5", and some of their classes are "Since 1.6". We can only use them if we are abandoning
any chance of River running with a 1.4 rt.jar.
> 
> To my mind, the advances in java.util and its sub-packages are a really strong motivation
for getting to 1.5 or, better, 1.6 ASAP.
> 
> Currently, a quick grep indicates java.util.concurrent is only used in ./qa/src/com/sun/jini/qa/harness/HeartOfTheMachine.java,
which is part of the QA infrastructure, not the production system.
> 
> ...
> 
>> Have you got any examples of a formal proof of correctness? Just out of
>> curiosity?
> 
> Unfortunately, the proofs of correctness I've written that related to concurrency were
based on confidential data and written on the job when I was working as a performance architect
for Cray Research and Sun Microsystems.
> 
> I do have some coursework proofs from the UCSD Design of Algorithms graduate course.
I'll dig out an example to send to you directly.
> 
> Proof of correctness of large systems is a research topic. I'm talking about looking
very narrowly at a class that has been shown to be performance-critical and is being subjected
to special performance tuning that uses risky techniques such as volatile.
> 
> Patricia
> 

Mime
View raw message