cayenne-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrus Adamchik <and...@objectstyle.org>
Subject Re: paged query slow when fetching big lists
Date Thu, 05 Jul 2007 12:21:39 GMT
3.0 (SVN trunk), which will hopefully be released fairly soon as 3.0M1

Andrus


On Jul 5, 2007, at 3:06 PM, Borut BolĨina wrote:
> Hello,
>
> which version includes this corrections? 2.0.3 or 3.0?
>
> Thanks,
> Borut
>
> On 26.6.2007 20:32, Andrus Adamchik wrote:
>> Hi Marcin,
>>
>> I have good news (I think). Fetching just id columns inside the  
>> IncrementalFaultList indeed speeds things up significantly. I just  
>> committed the change to optimize SelectQueries to do just that.  
>> Please let me know how does it work for you.
>>
>> Now the profiling details...
>>
>> * I profiled on Derby and MySQL. In both cases fetching a table  
>> with 25 columns and 100000 rows took between 3-4 seconds (not as  
>> long as in your case, but what's important is relative times I guess)
>> * YourKit clearly showed the bottleneck: ~95% of the 'fillIn'  
>> method is spent in the driver code, rewinding the result set (i.e.  
>> brining the data from db to the client).
>> * After my change the query time went down to 0.2-0.5 second (0.8  
>> if you consider the second query needed to fault the first page).  
>> Not bad!
>> * ResultSet reading still remained a bottleneck, but it became  
>> faster in absolute terms. And now finally Cayenne-related code  
>> (such as DataRow creation) started to show up on the radar (e.g.  
>> DataRow constructor taking 3% of the 'fillIn' method time).
>>
>> Andrus
>>
>>
>>
>> On Jun 26, 2007, at 10:55 AM, Marcin Skladaniec wrote:
>>
>>> Hi Andrus !
>>> Many thanks for that !
>>> Marcin
>>>
>>> On 26/06/2007, at 5:39 PM, Andrus Adamchik wrote:
>>>
>>>> Hi Marcin,
>>>>
>>>>> 1) SelectQuery(Student.class) with page size 10 takes 30-50  
>>>>> seconds.
>>>>> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
>>>>> STUDENT") without paging takes 100 sec.
>>>>> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
>>>>> STUDENT") with page size 10 takes 5 sec.
>>>>> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
>>>>> STUDENT") with page size 10 and fetching row data rows takes  
>>>>> 4.5 sec.
>>>>
>>>> I see you still didn't use profiler, but the data you provided  
>>>> seems to finally confirm that at least on Derby not fetching all  
>>>> columns does result in significant speedup (#1 vs. #3). So now  
>>>> it is a question of implementing the right algorithm for the  
>>>> IncrementalFaultList.
>>>>
>>>>> Andrus, you mentioned using addCustomDbAttribute to fetch only  
>>>>> part of the data. I tried to use addCustomDbAttribute("id") on  
>>>>> client, it resulted in returning the raw dataRows, is there  
>>>>> something I can do to fetch faulted objects ?
>>>>
>>>> We should encapsulate this logic inside IncrementalFaultList on  
>>>> the server.
>>>>
>>>>> Our application was designed to use the SelectQuery. If we have  
>>>>> to change that and use the SQLTemplate instead, there is a lot  
>>>>> of work for us, including:
>>>>
>>>> Same thing - the right thing to do is to fix it on the server.
>>>>
>>>> Let me try to find a spare minute later tonight and implement id- 
>>>> only fetch. I have some large tables in a MySQL5 so I can test  
>>>> the performance in a slightly different environment.
>>>>
>>>> Andrus
>>>>
>>>>
>>>>
>>>> On Jun 26, 2007, at 9:26 AM, Marcin Skladaniec wrote:
>>>>> Hi
>>>>>
>>>>> I have done some more profiling and testing.
>>>>>
>>>>> executing queries on table with >100000 records, directly on  
>>>>> server (not on client) gave results as listed below:
>>>>>
>>>>> 1) SelectQuery(Student.class) with page size 10 takes 30-50  
>>>>> seconds.
>>>>> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
>>>>> STUDENT") without paging takes 100 sec.
>>>>> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
>>>>> STUDENT") with page size 10 takes 5 sec.
>>>>> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
>>>>> STUDENT") with page size 10 and fetching row data rows takes  
>>>>> 4.5 sec.
>>>>>
>>>>> what more, I found that executing the SQLTemplate does allow to  
>>>>> fault the objects (I sometimes discover the simplest things  
>>>>> last), so I did try to check how long it takes for the objects  
>>>>> to be faulted:
>>>>> 1) first object on every page (except first) 30-200ms, rest = 0ms
>>>>> 2) objects is faulted in 20ms (average)
>>>>> 3) on first page first object faulted in 200ms, rest  ~20 ms,
>>>>>      on any following page first object faulted in 30-200ms,  
>>>>> rest 0ms (interesting that the first page does not seem to be  
>>>>> faulted at all)
>>>>> 4) no point testing.
>>>>>
>>>>> Also I did check if the resizing of the ArrayList which is  
>>>>> keeping the results does affect the speed, and it does not.  
>>>>> (Tried to make the ArrayList initial size = 150,000).
>>>>>
>>>>> My conclusion is that SelectQuery with paging is usable only  
>>>>> for fetching less than, say 10,000 records, otherwise the  
>>>>> performance is to low. With SQLTemplate the performance is much  
>>>>> greater. It applies to both ROP and 'normal' cayenne, since I  
>>>>> made those tests on server.
>>>>>
>>>>> Andrus, you mentioned using addCustomDbAttribute to fetch only  
>>>>> part of the data. I tried to use addCustomDbAttribute("id") on  
>>>>> client, it resulted in returning the raw dataRows, is there  
>>>>> something I can do to fetch faulted objects ?
>>>>>
>>>>> Our application was designed to use the SelectQuery. If we have  
>>>>> to change that and use the SQLTemplate instead, there is a lot  
>>>>> of work for us, including:
>>>>> - dealing with adding and concatenating Expressions to the  
>>>>> SQLTemplate (is there an easy way ?)
>>>>> - dealing with declared qualifier (the one set in modeller)
>>>>> - possibly more...
>>>>> i would really like to avoid all of that, so if you have any  
>>>>> ideas on how to improve the performance without too much hassle  
>>>>> I would really appreciate.
>>>>>
>>>>> Marcin
>>>>>
>>>>> On 25/06/2007, at 8:31 PM, Marcin Skladaniec wrote:
>>>>>
>>>>>> Hi Andrus
>>>>>> I had not much time to check, but with the fix the 100k  
>>>>>> records load in 30 instead of 50 seconds. It is some  
>>>>>> improvement, but not enough. I'll do some more profiling  
>>>>>> tomorrow and let you know.
>>>>>>
>>>>>> By the way,  we are using netbeans for profiling, the  
>>>>>> benefit : it is free. I will evaluate the yourkit as we are  
>>>>>> moving away from netbeans as a development platform.
>>>>>>
>>>>>> Marcin
>>>>>>
>>>>>> On 23/06/2007, at 5:38 PM, Andrus Adamchik wrote:
>>>>>>
>>>>>>> Ari, Marcin --
>>>>>>>
>>>>>>> going through the code I noticed one inefficiency - the  
>>>>>>> elements array access is synchronized in 'fillIn' method.  
>>>>>>> Since 'fillIn' is called from constructor, such  
>>>>>>> synchronization is unneeded and only slows things down. I  
>>>>>>> just checked a fixed version to trunk. Could you try it out?
>>>>>>>
>>>>>>> Andrus
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Jun 23, 2007, at 12:33 AM, Aristedes Maniatis wrote:
>>>>>>>> On 22/06/2007, at 11:10 PM, Michael Gentry wrote:
>>>>>>>>
>>>>>>>>> Marcin, this thread might be of interest to you ...
>>>>>>>>>
>>>>>>>>> http://mail-archives.apache.org/mod_mbox/cayenne-dev/

>>>>>>>>> 200705.mbox/browser
>>>>>>>>>
>>>>>>>>> Look at the "Paging and SQL queries" thread on May 25.
>>>>>>>>
>>>>>>>>
>>>>>>>> Yes, this is the same project we are working on. I started
 
>>>>>>>> some performance profiling and Marcin has been able now to
 
>>>>>>>> take it much further. What is it about:
>>>>>>>>
>>>>>>>>> elements.add(it.nextObjectId(entity));
>>>>>>>>
>>>>>>>> which is so slow? The code gets a little complex at that
 
>>>>>>>> point and we are having difficulty tracing it through to
the  
>>>>>>>> exact performance problem in the underlying code. Is it the
 
>>>>>>>> speed of adding the object id to the Collection or the speed
 
>>>>>>>> of creating an object id itself? 0.5ms doesn't sound slow,
 
>>>>>>>> but it doesn't scale well.
>>>>>>>>
>>>>>>>> Andrus, I got the impression from the previous thread that
 
>>>>>>>> you suspected something slightly different. That the  
>>>>>>>> performance problem might be in the fat query itself, but
 
>>>>>>>> from our tests this isn't the case. If I've got this right,
 
>>>>>>>> the way it works is:
>>>>>>>>
>>>>>>>> 1. perform regular query to get all columns but return  
>>>>>>>> result in iterator
>>>>>>>> 2. iterate through first page and build full objects
>>>>>>>> 3. iterate through other pages and build just objectids 

>>>>>>>> (this is the slow part for us)
>>>>>>>> 4. when another page is fetched perform another query and
 
>>>>>>>> fetch those objects from the DB
>>>>>>>>
>>>>>>>> So, getting just primary keys from the DB may not be any
 
>>>>>>>> faster if the performance problem is simply in the  
>>>>>>>> construction of objectIds. If the performance problem is
in  
>>>>>>>> the numerous resizings of the Collection (each time it runs
 
>>>>>>>> out of space, then it is increased by 50% or 100% in size),
 
>>>>>>>> then the solution could be as simple as figuring out the
 
>>>>>>>> size of the iterator and sizing the collection appropriately
 
>>>>>>>> from the beginning.
>>>>>>>>
>>>>>>>> Any ideas on how to discover the exact cause of the  
>>>>>>>> performance hit?
>>>>>>>>
>>>>>>>>
>>>>>>>> Ari Maniatis
>>>>>>>>
>>>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>
> -- 
> <blog-logo.gif>


Mime
View raw message