cayenne-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrus Adamchik <and...@objectstyle.org>
Subject Re: paged query slow when fetching big lists
Date Tue, 26 Jun 2007 18:32:27 GMT
Hi Marcin,

I have good news (I think). Fetching just id columns inside the  
IncrementalFaultList indeed speeds things up significantly. I just  
committed the change to optimize SelectQueries to do just that.  
Please let me know how does it work for you.

Now the profiling details...

* I profiled on Derby and MySQL. In both cases fetching a table with  
25 columns and 100000 rows took between 3-4 seconds (not as long as  
in your case, but what's important is relative times I guess)
* YourKit clearly showed the bottleneck: ~95% of the 'fillIn' method  
is spent in the driver code, rewinding the result set (i.e. brining  
the data from db to the client).
* After my change the query time went down to 0.2-0.5 second (0.8 if  
you consider the second query needed to fault the first page). Not bad!
* ResultSet reading still remained a bottleneck, but it became faster  
in absolute terms. And now finally Cayenne-related code (such as  
DataRow creation) started to show up on the radar (e.g. DataRow  
constructor taking 3% of the 'fillIn' method time).

Andrus



On Jun 26, 2007, at 10:55 AM, Marcin Skladaniec wrote:

> Hi Andrus !
> Many thanks for that !
> Marcin
>
> On 26/06/2007, at 5:39 PM, Andrus Adamchik wrote:
>
>> Hi Marcin,
>>
>>> 1) SelectQuery(Student.class) with page size 10 takes 30-50 seconds.
>>> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
>>> STUDENT") without paging takes 100 sec.
>>> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
>>> STUDENT") with page size 10 takes 5 sec.
>>> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
>>> STUDENT") with page size 10 and fetching row data rows takes 4.5  
>>> sec.
>>
>> I see you still didn't use profiler, but the data you provided  
>> seems to finally confirm that at least on Derby not fetching all  
>> columns does result in significant speedup (#1 vs. #3). So now it  
>> is a question of implementing the right algorithm for the  
>> IncrementalFaultList.
>>
>>> Andrus, you mentioned using addCustomDbAttribute to fetch only  
>>> part of the data. I tried to use addCustomDbAttribute("id") on  
>>> client, it resulted in returning the raw dataRows, is there  
>>> something I can do to fetch faulted objects ?
>>
>> We should encapsulate this logic inside IncrementalFaultList on  
>> the server.
>>
>>> Our application was designed to use the SelectQuery. If we have  
>>> to change that and use the SQLTemplate instead, there is a lot of  
>>> work for us, including:
>>
>> Same thing - the right thing to do is to fix it on the server.
>>
>> Let me try to find a spare minute later tonight and implement id- 
>> only fetch. I have some large tables in a MySQL5 so I can test the  
>> performance in a slightly different environment.
>>
>> Andrus
>>
>>
>>
>> On Jun 26, 2007, at 9:26 AM, Marcin Skladaniec wrote:
>>> Hi
>>>
>>> I have done some more profiling and testing.
>>>
>>> executing queries on table with >100000 records, directly on  
>>> server (not on client) gave results as listed below:
>>>
>>> 1) SelectQuery(Student.class) with page size 10 takes 30-50 seconds.
>>> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
>>> STUDENT") without paging takes 100 sec.
>>> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
>>> STUDENT") with page size 10 takes 5 sec.
>>> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
>>> STUDENT") with page size 10 and fetching row data rows takes 4.5  
>>> sec.
>>>
>>> what more, I found that executing the SQLTemplate does allow to  
>>> fault the objects (I sometimes discover the simplest things  
>>> last), so I did try to check how long it takes for the objects to  
>>> be faulted:
>>> 1) first object on every page (except first) 30-200ms, rest = 0ms
>>> 2) objects is faulted in 20ms (average)
>>> 3) on first page first object faulted in 200ms, rest  ~20 ms,
>>>      on any following page first object faulted in 30-200ms, rest  
>>> 0ms (interesting that the first page does not seem to be faulted  
>>> at all)
>>> 4) no point testing.
>>>
>>> Also I did check if the resizing of the ArrayList which is  
>>> keeping the results does affect the speed, and it does not.  
>>> (Tried to make the ArrayList initial size = 150,000).
>>>
>>> My conclusion is that SelectQuery with paging is usable only for  
>>> fetching less than, say 10,000 records, otherwise the performance  
>>> is to low. With SQLTemplate the performance is much greater. It  
>>> applies to both ROP and 'normal' cayenne, since I made those  
>>> tests on server.
>>>
>>> Andrus, you mentioned using addCustomDbAttribute to fetch only  
>>> part of the data. I tried to use addCustomDbAttribute("id") on  
>>> client, it resulted in returning the raw dataRows, is there  
>>> something I can do to fetch faulted objects ?
>>>
>>> Our application was designed to use the SelectQuery. If we have  
>>> to change that and use the SQLTemplate instead, there is a lot of  
>>> work for us, including:
>>> - dealing with adding and concatenating Expressions to the  
>>> SQLTemplate (is there an easy way ?)
>>> - dealing with declared qualifier (the one set in modeller)
>>> - possibly more...
>>> i would really like to avoid all of that, so if you have any  
>>> ideas on how to improve the performance without too much hassle I  
>>> would really appreciate.
>>>
>>> Marcin
>>>
>>> On 25/06/2007, at 8:31 PM, Marcin Skladaniec wrote:
>>>
>>>> Hi Andrus
>>>> I had not much time to check, but with the fix the 100k records  
>>>> load in 30 instead of 50 seconds. It is some improvement, but  
>>>> not enough. I'll do some more profiling tomorrow and let you know.
>>>>
>>>> By the way,  we are using netbeans for profiling, the benefit :  
>>>> it is free. I will evaluate the yourkit as we are moving away  
>>>> from netbeans as a development platform.
>>>>
>>>> Marcin
>>>>
>>>> On 23/06/2007, at 5:38 PM, Andrus Adamchik wrote:
>>>>
>>>>> Ari, Marcin --
>>>>>
>>>>> going through the code I noticed one inefficiency - the  
>>>>> elements array access is synchronized in 'fillIn' method. Since  
>>>>> 'fillIn' is called from constructor, such synchronization is  
>>>>> unneeded and only slows things down. I just checked a fixed  
>>>>> version to trunk. Could you try it out?
>>>>>
>>>>> Andrus
>>>>>
>>>>>
>>>>>
>>>>> On Jun 23, 2007, at 12:33 AM, Aristedes Maniatis wrote:
>>>>>> On 22/06/2007, at 11:10 PM, Michael Gentry wrote:
>>>>>>
>>>>>>> Marcin, this thread might be of interest to you ...
>>>>>>>
>>>>>>> http://mail-archives.apache.org/mod_mbox/cayenne-dev/ 
>>>>>>> 200705.mbox/browser
>>>>>>>
>>>>>>> Look at the "Paging and SQL queries" thread on May 25.
>>>>>>
>>>>>>
>>>>>> Yes, this is the same project we are working on. I started  
>>>>>> some performance profiling and Marcin has been able now to  
>>>>>> take it much further. What is it about:
>>>>>>
>>>>>>> elements.add(it.nextObjectId(entity));
>>>>>>
>>>>>> which is so slow? The code gets a little complex at that point  
>>>>>> and we are having difficulty tracing it through to the exact  
>>>>>> performance problem in the underlying code. Is it the speed of  
>>>>>> adding the object id to the Collection or the speed of  
>>>>>> creating an object id itself? 0.5ms doesn't sound slow, but it  
>>>>>> doesn't scale well.
>>>>>>
>>>>>> Andrus, I got the impression from the previous thread that you  
>>>>>> suspected something slightly different. That the performance  
>>>>>> problem might be in the fat query itself, but from our tests  
>>>>>> this isn't the case. If I've got this right, the way it works is:
>>>>>>
>>>>>> 1. perform regular query to get all columns but return result  
>>>>>> in iterator
>>>>>> 2. iterate through first page and build full objects
>>>>>> 3. iterate through other pages and build just objectids (this  
>>>>>> is the slow part for us)
>>>>>> 4. when another page is fetched perform another query and  
>>>>>> fetch those objects from the DB
>>>>>>
>>>>>> So, getting just primary keys from the DB may not be any  
>>>>>> faster if the performance problem is simply in the  
>>>>>> construction of objectIds. If the performance problem is in  
>>>>>> the numerous resizings of the Collection (each time it runs  
>>>>>> out of space, then it is increased by 50% or 100% in size),  
>>>>>> then the solution could be as simple as figuring out the size  
>>>>>> of the iterator and sizing the collection appropriately from  
>>>>>> the beginning.
>>>>>>
>>>>>> Any ideas on how to discover the exact cause of the  
>>>>>> performance hit?
>>>>>>
>>>>>>
>>>>>> Ari Maniatis
>>>>>>
>>>>>>
>>>
>>>
>>>
>>
>
>
>
>


Mime
View raw message