cayenne-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Paul Le Fèvre <jean-paul.lefe...@cea.fr>
Subject Re: Problem concerning Prefetching
Date Mon, 28 Jan 2008 13:31:07 GMT
On Thursday 24 January 2008, Aristedes Maniatis wrote:
> On 25/01/2008, at 3:10 AM, Jean-Paul Le Fèvre wrote:
> > I'm currently trying to improve the performances of my queries.
> > I've added a call to addPrefetch() in my code but the result was not
> > as
> > expected : the time to fetch the 42597 rows increased from 35 s
> > without
> > addPrefetch() to 56 s with addPrefetch()
>
> That is expected. The initial query will take longer with the prefetch
> since you are getting more data. What will be faster is following
> relations later since these other records are now already in memory.
> It is unlikely you'll want to prefetch relations to all 42000 records.


Actually it is *very likely* that I want to prefetch relations to all
42000 records (or more) hundreds of times ! So it's worth spending some time
to optimize my implementation.

I've made 3 tests so far : one using plain sql, one using cayenne without
prefetching and the third one using prefetching.
The first one is done in less that 4 s, the second one in 35 s and the last
one in more that 50 s.

To find out how cayenne works I've extracted from the MySQL log file the
statements sent to the server.
(I've simplified the actual queries to make things clearer)

Manually written SQL statement :
----------------------

select galaxies.number, galaxy_component_properties.mass_of_baryons
       from galaxy_component_properties
       join galaxies on galaxy_component_properties.galaxy_id = galaxies.id
       where some_clause
       order by galaxies.number;


Cayenne generated (without prefetching) SQL statements :
-------------------------

SELECT t0.* FROM galaxies t0 WHERE some_clause ORDER BY t0.number;

then ~40000 queries :
SELECT t0.* FROM galaxy_component_properties t0
       WHERE t0.galaxy_id = nnnnnn;

It is clear that issuing more than 40000 times a SELECT to get the related
properties leads to an execution time of 35 s.

Cayenne generated (with prefetching) SQL statements :
-------------------------

SELECT t0.* FROM galaxies t0 WHERE some_clause ORDER BY t0.number;

SELECT t0.* FROM galaxy_component_properties t0, galaxies t1
       WHERE t0.galaxy_id = t1.id AND some_clause;

then ~40000 queries :
SELECT DISTINCT t0.*
       FROM galaxies t0, galaxy_component_properties t1
       WHERE t0.id = t1.galaxy_id AND (t1.id = nnnnn)

The first 2 SELECT are expected and should have been sufficient to build my
collection of galaxies. But they are followed by 40000 queries which make
performance worse than without prefetching.

Comments, advice are appreciated ...
>
> Investigate paging as possibly a more appropriate optimisation for you.

I'm gonna try ...

>
> Ari Maniatis
>
-- 
___________________________________________________________________

Jean-Paul Le Fèvre  * Mail : LeFevre@fonteny.org


Mime
View raw message