cayenne-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Musall, Maik" <>
Subject Re: Fetching lots of objects
Date Wed, 08 Mar 2017 21:12:35 GMT
Hi Andrus,

I'm continuing this on the dev@ list if you don't mind?

> Am 08.03.2017 um 20:13 schrieb Andrus Adamchik <>:
>> It would be nice if Cayenne would internally parallelize things like ObjectResolver.objectsFromDataRows()
and use lock-free strategies to deal with the caching.
> This is probably the last (and consequently the worst) place in Cayenne where locking
still occurs. After I encountered this problem in a high-concurrency system, I've done some
analysis of it (see [1] and also [2]), and this has been my "Cayenne 5.0" plan for a long
time. With 4.0 making such progress as it does now, we may actually start contemplating it
> Andrus
> [1]
> [2]

Interesting read!

Regarding the array-based DataObject concept, wouldn't this mean for name-based attribute
lookups that you still need a map somewhere that translates names to indexes? That map would
only be needed once per entity, however.

Instead of the array-based approach, did you also consider ConcurrentHashMap and similar classes
in java.util.concurrent? It would not have all the other advantages besides concurrency, but
could perhaps serve as an easy intermediate step to get rid of the locking, and be implemented
even in 4.0 already.

And on the [1] discussion, I'd like to mention my use case again: big queries with lots of
prefetches to suck in gigabytes of data for aggregate computations using DataObject business
logic. During those fetches, other users expect to be able to continue their regular workload
concurrently (which they mostly cannot using EOF: my main reason to switch). So however this
[1] concept turns out, I'd like to also be able to parallelize the fetches themselves. A useful
first step would be to execute disjoint prefetches in separate threads.

A second step could be to have even a single big table scan query parallelized by partioning.
Databases have been able to organize large tables into partitions that can be scanned independently
from each other. Back in the days with Oracle and slower spinning disks you would spread partitions
between independent disks, while today with SSDs and zero seek time that could still help
to increase the throughput when CPU is the limiting factor (databases also tend to generate
high CPU loads when doing full table scans, but only on one core per scan). An idea could
be to include a partitioning criterium in the model, which matches the database's criterium
for the table in question.

In the meantime I could try partitioning the queries on the application level, which can also
work, but I'm back at the Graph Manager locking problem when merging them into one context
for processing.

Today's hardware with databases on SSDs that can deliver 3 GByte/s or more, and 16+ cores
for processing calls for parallelization on every level.


View raw message