cayenne-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrus Adamchik <and...@objectstyle.org>
Subject Re: DataContext select concurrency
Date Mon, 09 Dec 2013 07:42:00 GMT
So I did some work on this recently and did some benchmarking of various things. In a highly
concurrent environment (to the extent I can emulate one on an MBP), just turning off the locking
gives about 30% performance increase. Considering the measurements included the actual communication
with the DB, this is a lot. 

Now to also get consistent data, I tried the queue approach. Instead of using LinkedBlockingQueue,
I used LMAX Disruptor [1], which is a rather cool technology for building processing pipelines.
The way the implementation worked is that multiple read threads would pass their data rows
to Disruptor that will then load them to a single ObjectContext in a single thread. When this
is done, waiting threads are notified and can get unblocked. 

The results with Disruptor however were exactly equivalent to those with the current synchronized
approach. This means single-threaded lock-free rows-to-objects converter was underperforming,
running on a single core (the overall CPU load for ~25%, with this one thread doing the work,
and others waiting). I guess the gains with technology like Disruptor can only be achieved
via “mechanical sympathy” approaches, that require certain CPU-friendly parameters of
the executing code.

Regardless of the disappointing result, I think we learned a lot and still need to dig more
in this direction. I may commit the prerequisite code that changes “synchronized” to ReentrantLock,
hence allowing to turn locking on and off. Also instead of this micro-benchmark I may try
it with some of the webapp load tests that simulate more realistic loads.

I also noticed a few things that we might offload to a different pipeline (e.g. updating shared
cache… the query thread should not be waiting for that). And a few things that can be done
by caller threads to shorten the synchronized block (such pre-constructing all needed ObjectIds).

Finally the fact that DataObjects themselves are non-concurrent prevents certain concurrency
optimizations in the ObjectStore. So the Object[] idea is fundamental to further improvements
(although I may try by changing CayenneDataObject to ConcurrentHashMap internally).

Anyways, lots of things we can improve and so little time :)

Andrus


[1] http://lmax-exchange.github.io/disruptor/


On Nov 5, 2013, at 9:42 AM, Andrus Adamchik <andrus@objectstyle.org> wrote:
>> Sorry if I'm being daft. I waited a bit to see if other people would ask some questions
to help get my head around it. But no one took a bite, so I'm having a go.
> 
> No worries, I am glad we are talking about it :)
> 
> Actually each queue will contain result lists (lists of DataRows), not individual objects.
So yeah, the two proposals ((1) CayenneDataObject internal structure change and (2) concurrent
selects) are generally unrelated. However potentially some of the implementations of (2) may
take advantage of (1).
> 
> I started on (2) this weekend. For now I’ve implemented in my local git repo a separate
DI module that allows users to create read-only DataContext subclasses. I also replaced all
“synchronized” uses in the ObjectStore with ReenterantLock [1]. This resulted in more
boilerplate code (lock / try / finally / unlock), but made all synchronization easy to turn
on and off in one place. I guess the next step is experimenting with result processing queues.
> 
> [1] http://docs.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/locks/ReentrantLock.html
> 
> Andrus
> 
> On Nov 4, 2013, at 1:06 PM, Aristedes Maniatis <ari@maniatis.org> wrote:
> 
>> So then queries on the same table would be queued because you don't want to return
a mix of fresh and non-fresh data to the user in the same response. Is that the problem you
want to solve with object-level atomicity, and just swapping out the Object[]?
>> 
>> With the queue approach, are you thinking that the queue is a list of every object
which has been fetched from the database and Cayenne has already determined that the ObjectStore
is out date and needs updating? Or just a list of every object fetched from the database,
with checking for freshness something that happens as objects are taken from the queue for
processing?
>> 
>> I'm still getting my head around your ideas, but there appear to be two different
things here:
>> 
>> 1. Swappping out the dataObject atomically to eliminate the lock on the ObjectStore.
This avoids the lock held during the time it takes to update the values in the objectMap.
For example, here: synchronized ObjectDiff registerDiff(Object nodeId, NodeDiff diff) {}.
The code would then look like:
>> 
>> newObject = dataObject.clone();
>> DataRowUtils.forceMergeWithSnapshot(context, descriptor, newObject, snapshot);
>> dataObject = newObject;
>> 
>> Or something vaguely like that.
>> 
>> 
>> 2. Creating a queue to allow a pool of workers to convert raw DataRows into object
properties, decide which records in the ObjectStore need updating, create NodeDiff objects
with those changes, etc.
>> 
>> 
>> Sorry if I'm being daft. I waited a bit to see if other people would ask some questions
to help get my head around it. But no one took a bite, so I'm having a go.
>> 
>> I'm not seeing how the two ideas relate to each other. They both seem helpful, but
they seem to solve different bottlenecks. What chaos would (1) cause?
>> 
>> 
>> Ari
>> 
>> 
>> 
>> On 4/11/2013 6:53pm, Andrus Adamchik wrote:
>>> I am actually considering a read-only case here. So no modifications.
>>> 
>>> If the objects need to be modified, they have to be transferred to a peer ObjectContext
using 'localObject'. Which sorta makes sense even now - contexts with local cache are often
shared and hence de-facto have to be read-only, and contexts that track modifications are
user- or request- or method- scoped.
>>> 
>>> A.
>>> 
>>> On Nov 4, 2013, at 10:42 AM, Aristedes Maniatis <ari@maniatis.org> wrote:
>>> 
>>>> On 26/10/2013 3:09am, Andrus Adamchik wrote:
>>>> 
>>>> 
>>>>> 2. Queue based approach… Place each query result merge operation in
an operation queue for a given DataContext. Polling end of the queue will categorize the operations
by "affinity", and assign each op to a worker thread, selected from a thread pool based on
the above "affinity". Ops that may potentially update the same objects are assigned to the
same worker and are processed serially. Ops that have no chance of creating conflict between
each other are assigned to separate workers and are processed in parallel. 
>>>> 
>>>> This queue needs to keep both SELECT and modify operations in some sort of
order? So let's imagine you get a queue like this:
>>>> 
>>>> 1. select table A
>>>> 2. select table B
>>>> 3. select table A
>>>> 4. modify table B
>>>> 5. select table B
>>>> 6. select table A
>>>> 
>>>> Is the idea here that you would dispatch 1,2,3,6 to three worker threads
to be executed in parallel. But then 4 would be queued behind 2. And 5 would also wait until
4 was complete.
>>>> 
>>>> Is that the idea?
>>>> 
>>>> 
>>>> I can see some situations where this would result in worse behaviour than
we have now. If operation 1 and 3 were the same query, then today we get to take advantage
of a query cache.
>>>> 
>>>> 
>>>> Am I getting the general idea right?
>>>> 
>>>> 
>>>> Ari
>>>> 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> -------------------------->
>>>> Aristedes Maniatis
>>>> GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A
>>>> 
>>> 
>> 
>> -- 
>> -------------------------->
>> Aristedes Maniatis
>> GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A
>> 
> 
> 


Mime
View raw message