river-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Firmstone <j...@zeus.net.au>
Subject Re: TaskManager progress
Date Thu, 22 Jul 2010 02:59:19 GMT
Actually a problem I have, is I don't have access to the resources 
required to performance test some of these implementations, as they 
would be stressed under a massive cluster situation.

The other assumption I make is something that performs well today, might 
not tomorrow, due to the multi core revolution we have on our hands.  
This may turn out to be a flawed assumption.

Generally how I like to code is, and this isn't related to your 
situation, is if it makes sense to do so, I make immutable object 
builder / factory's that are not threadsafe, I provide a method on the 
immutable object for getting a new builder instance that has the state 
of the immutable object pre set, which I can modify before building a 
replacement immutable object.

I might use one builder to generate many immutable objects, the builder 
object is accessed only by one thread.

The builder might internally utilise a static concurrent weak reference 
hash pool of immutable objects, it knows the hashcode generator the 
immutable object uses, so can pool the immutable objects, saving memory, 
or it might create an immutable object, then lookup it's hashcode in the 
pool, find a duplicate, then discard the new object if equals(), 
returning the pool copy.  Pooling also speeds up the equals() operator.

The immutable objects then get used everywhere, without concern for 
thread synchronization.  These work well with AtomicReferences where the 
new state depends on the old.

The immutability of the object could be easily abused by reflection, but 
you can't be expected to protect against that! The immutable object 
might be a container that holds some mutable objects that are now 
effectively immutable.

The immutable object can be represented by an interface, because the 
client doesn't depend on a constructor, in which case you can internally 
have any number of polymorphic implementations, which all appear as a 
single type to the client, giving a very compact API.  The pooling 
offsets memory consumption for immutable objects.



Peter Firmstone wrote:
> I have a similar mindset to Gregg, memory and disk is relatively 
> inexpensive these days, if I can avoid locks by using atomic 
> operations and immutable objects or concurrent utilities, I'm happy 
> since it's one less possible dead lock or live lock bug I haven't 
> thought about.
> If updated state doesn't depend on previous state, I'll go for an 
> immutable object with a volatile reference.  If the object is not 
> immutable and it can be defensively copied, I do that before updating 
> the volatile reference and I defensively copy it again before 
> returning it to a caller.
> If updated state depends on previous state, I might use an immutable 
> object with an AtomicReference, where the update is only made when no 
> other update was received in the interim.  If I can, I try to make 
> object's effectively immutable, with defensive copying.
> If internal accesor methods don't need to concern themselves with a 
> reference update during a routine, I copy an object's reference rather 
> than synchronize on it, the copy will still refer to the old object 
> when the volatile reference is updated.  If the routine is in a loop, 
> and I want to restart this if the reference is updated, I'll use 
> while( a == b) (or something similar), where b is a reference to the 
> object referred to by a until a is changed.
> I try to keep synchronized blocks as small as possible, not so much 
> for performance, but for bugs, not even necessarily my own bugs but 
> client code concurrency bugs.  In the synchronized block, I don't call 
> objects which may be accessible from outside the object I'm calling 
> from.  State that needs to be atomically updated, I group together 
> using the same lock, I also consider using the ReadWriteLock, if reads 
> will outnumber writes. If multiple objects must be updated atomically, 
> I might group them together into an encapsulating object with the 
> methods I need to make it atomic.  This is better than holding 
> multiple locks.
> On some occasions I find a simple class that isn't threadsafe at all 
> is the best approach, letting something else handle the concurrency or 
> ensuring it's only used by one thread.
> For me it basically comes down to avoiding bugs first, followed by scale.
> Obviously memory consumption can be an impediment to scale, so there 
> are occasions where this is the wrong approach, but it's a 
> generalisation, to be taken with a grain of salt.
> If memory is an issue, there usually isn't much concurrency to be had, 
> if that's the case then good old fashioned synchronization or none at 
> all might be the best way to go.
> In that case, I might consider an interface, and separate 
> implementations for different platforms, one for memory, the other for 
> concurrency.
> It's true that concurrency is harder, people often forget to check the 
> return value of putIfAbsent, on ConcurrentMap.
> Horses for courses I suppose, everyone has their style, you don't have 
> to adopt mine, I'm just happy to have some help.  There's plenty of 
> code in River that uses synchronized and has no issues.  You probably 
> have enough experience to avoid the locking bugs by now, I'm happy 
> with your approach.  It's probably more performant than mine;)  Some 
> concurrency utilities can chew some memory.
> Maybe it's a reflection of my debugging abilities ;)
> Cheers,
> Peter.
> Patricia Shanahan wrote:
>> On 7/21/2010 12:58 PM, Gregg Wonderly wrote:
>> ...
>>> When I write code of this nature, attempting to remove all 
>>> contention, I
>>> try
>>> to list every "step" that changes the "view" of the world, and think 
>>> about
>>> how that "view" can be made atomic by using explicit ordering of 
>>> statements
>>> rather than synchronized{} blocks.  ...
>> I would like to discuss how to approach performance improvement, and 
>> especially scaling improvement. We seem to have different 
>> philosophies, and I'm interested in understanding other people's 
>> approaches to programming.
>> I try to first find the really big wins, which are almost always data 
>> structure and algorithm changes. That should result in code that is 
>> efficient in terms of total CPU time and memory. During that part of 
>> the process, I prefer to keep the concurrency design as simple as 
>> possible, which in Java often means using synchronization at a coarse 
>> level, such as synchronization on a TaskManager instance.
>> Once that is done, I review the performance. If it is fast and 
>> scalable I stop there. If that is not the case, I look for the 
>> bottlenecks, and consider whether parallelism, or some other 
>> strategy, will best improve them. Any increase in concurrency 
>> complication has to be justified by a demonstrated improvement in 
>> performance.
>> My big picture objective is to find the simplest implementation that 
>> meets the performance requirements (or cannot reasonably be made 
>> significantly faster, if the requirement is just "make it fast"). I 
>> value simplicity in concurrency design over simplicity in data 
>> structures or algorithms for two reasons:
>> 1. Making the code more parallel does nothing to reduce the total 
>> resources is uses. Better algorithms, on the other hand, can 
>> significantly reduce total resources.
>> 2. Reasoning about data structures and algorithms is generally easier 
>> than reasoning about concurrency.
>> It sounds as though you are advocating almost the opposite approach - 
>> aim for maximum concurrency from the start, without analysis or 
>> measurement to see what it gains, or even having a baseline 
>> implementation for comparison. Is that accurate? If so, could you 
>> explain the thinking and objectives behind your approach? Or maybe 
>> I'm misunderstanding, and you can clarify a bit?
>> Thanks,
>> Patricia

View raw message