river-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patricia Shanahan <p...@acm.org>
Subject Re: Possible multi-threading bug
Date Tue, 30 Nov 2010 15:14:48 GMT
Real world integrator with the highest performance expectations is 
exactly the sort of person I think we should be talking with about 
getting real world data on River bottlenecks.

Any input? Possibilities for e.g. running with performance-specific logging?


Mike McGrady wrote:
> I just want to add that as a "real world" integrator with the highest
> performance expectations, I rely more on the architecture and design
> of systems to solve scalability problems.  A poor design can be made
> to scale but there are two scaling ceilings: not only performance but
> also cost.
> Sent from my iPhone
> Michael McGrady Principal investigator AF081_028 SBIR Chief Architect
>  Topia Technology, Inc Work 1.253.572.9712 Cel 1.253.720.3365
> On Nov 30, 2010, at 3:59 AM, Patricia Shanahan <pats@acm.org> wrote:
>> On 11/30/2010 1:43 AM, Peter Firmstone wrote:
>>> Patricia Shanahan wrote:
>>>> Tom Hobbs wrote:
>>>>> Yes, you're right.
>>>>> I knew about the non-atomicity of ++, my concern was a call
>>>>> to reset creeping in between the two parts of that operation.
>>>> That is a very good point.
>>>> Leaving reset unsynchronized, even with volatile, would lead to
>>>>  results that would not be possible with full synchronization.
>>>> Suppose thread A is going to do a reset, and thread B is going
>>>> to do an increment, and everybody agrees the count is currently
>>>> 10.
>> ....
>>> Ah yes, correct, my mistake, easy to stuff up isn't it? ;)
>>> In essence I agree, unfortunately we don't know when we need the 
>>> performance, because we can't test scalability. I've only got 4
>>> threads!
>> I've only got 8 threads on my largest system.
>> I am very, very strongly opposed to attempting performance tuning
>> without measurement. I've seen it tried many times over several
>> decades, and it is always a disaster. I've compared e.g. estimates
>> of where bottlenecks will be in an operating system prepared by the
>> OS developers to actual measurements, and I've yet to see the
>> developers get it right. That includes my own efforts at guessing
>> bottlenecks, before I learned that it is futile.
>> Without measurement, you get either get focused effort in entirely
>> the wrong places or diffuse effort spread over the whole system.
>> What is really needed is focused effort on a few real bottlenecks
>> that will go way beyond any set of rules that could reasonably be
>> applied throughout the system.
>> I believe the solution is to establish a working relationship with
>> users of River who have larger systems. They have a vested interest
>> in helping us make it more scalable.
>>> Here are some simple tips I've found useful:
>> Have you measured the effectiveness of these tips on real world
>> scalability? What sort of gain do you see?
>> There may also be opportunities in the area of data structure and
>> algorithm scalability. For example, TaskManager uses an ArrayList
>> to represent something that is basically a FIFO with some
>> reordering. That is a good idea if, and only if, the queue lengths
>> are always very short even on large systems. However, I have no
>> idea whether TaskManager queue lengths tend to increase on large
>> systems or not.
>>> 1. If all mutators are atomic and don't depend on previous state,
>>> a volatile reference or field may be sufficient, but now we have 
>>> concurrency utilities, why not use an atomic reference or field 
>>> instead? Then if we find we later need a method based on previous
>>>  state, it's easily proven correct.
>> Do we have the concurrency utilities? The java.util.concurrent
>> packages are all "Since 1.5", and some of their classes are "Since
>> 1.6". We can only use them if we are abandoning any chance of River
>> running with a 1.4 rt.jar.
>> To my mind, the advances in java.util and its sub-packages are a
>> really strong motivation for getting to 1.5 or, better, 1.6 ASAP.
>> Currently, a quick grep indicates java.util.concurrent is only used
>> in ./qa/src/com/sun/jini/qa/harness/HeartOfTheMachine.java, which
>> is part of the QA infrastructure, not the production system.
>> ...
>>> Have you got any examples of a formal proof of correctness? Just
>>> out of curiosity?
>> Unfortunately, the proofs of correctness I've written that related
>> to concurrency were based on confidential data and written on the
>> job when I was working as a performance architect for Cray Research
>> and Sun Microsystems.
>> I do have some coursework proofs from the UCSD Design of Algorithms
>> graduate course. I'll dig out an example to send to you directly.
>> Proof of correctness of large systems is a research topic. I'm
>> talking about looking very narrowly at a class that has been shown
>> to be performance-critical and is being subjected to special
>> performance tuning that uses risky techniques such as volatile.
>> Patricia

View raw message