cayenne-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrus Adamchik <and...@objectstyle.org>
Subject Re: Partitioning a query result..
Date Thu, 15 Dec 2016 08:18:33 GMT
Here is another idea:

* read all data in one thread using iterated query and DataRows
* append received rows to an in-memory queue (individually or in small batches)
* run a thread pool of processors that read from the queue and do the work.

As with all things performance, this needs to be measured and compared with a single-threaded
base line. This will not help with IO bottleneck, but the processing part will happen in parallel.
If you see any Cayenne bottlenecks during the last step, you can start multiple ServerRuntimes
- one per thread. 

Andrus

> On Dec 15, 2016, at 3:06 AM, John Huss <johnthuss@gmail.com> wrote:
> 
> Unless your DB disk is stripped into at least four parts this won't be
> faster.
> On Wed, Dec 14, 2016 at 5:46 PM Tony Giaccone <tgiaccone@gmail.com> wrote:
> 
>> I want to speed thing up, by running multiple instances of a job that
>> fetches data from a table.  So that for example if I need to process 10,000
>> rows
>> the query runs on each instance and returns 4 sets of 2500 rows one for
>> each instance with no duplication.
>> 
>> My first thought in SQL was to add something like this to the where
>> clause..
>> 
>> and MOD(ID, INSTANCE_COUNT) == INSTANCE_ID;
>> 
>> so that if the instance count was 4 then the instance IDs would run
>> 0,1,2,3.
>> 
>> I'm not quite sure how you would structure that using the queryAPI. Any
>> suggestions about that?
>> 
>> And there are some problems with this idea, as you have to be certain your
>> IDs increase in a manner that aligns with your math so that the
>> partitioning is equal in size.
>> For example if your sequence increments by 20, then you would have to futz
>> around with your math to get the right partitioning and that is the problem
>> with this technique.
>> It's brittle it depends on getting a bunch of things in  "sync".
>> 
>> Does anyone have another idea of how to segment out rows that would yield a
>> solution that's not quite so brittle?
>> 
>> 
>> 
>> Tony Giaccone
>> 


Mime
View raw message