cayenne-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Borut BolĨina <...@najdi.si>
Subject Re: paged query slow when fetching big lists
Date Thu, 05 Jul 2007 12:06:53 GMT
Hello,

which version includes this corrections? 2.0.3 or 3.0?

Thanks,
Borut

On 26.6.2007 20:32, Andrus Adamchik wrote:
> Hi Marcin,
>
> I have good news (I think). Fetching just id columns inside the 
> IncrementalFaultList indeed speeds things up significantly. I just 
> committed the change to optimize SelectQueries to do just that. Please 
> let me know how does it work for you.
>
> Now the profiling details...
>
> * I profiled on Derby and MySQL. In both cases fetching a table with 
> 25 columns and 100000 rows took between 3-4 seconds (not as long as in 
> your case, but what's important is relative times I guess)
> * YourKit clearly showed the bottleneck: ~95% of the 'fillIn' method 
> is spent in the driver code, rewinding the result set (i.e. brining 
> the data from db to the client).
> * After my change the query time went down to 0.2-0.5 second (0.8 if 
> you consider the second query needed to fault the first page). Not bad!
> * ResultSet reading still remained a bottleneck, but it became faster 
> in absolute terms. And now finally Cayenne-related code (such as 
> DataRow creation) started to show up on the radar (e.g. DataRow 
> constructor taking 3% of the 'fillIn' method time).
>
> Andrus
>
>
>
> On Jun 26, 2007, at 10:55 AM, Marcin Skladaniec wrote:
>
>> Hi Andrus !
>> Many thanks for that !
>> Marcin
>>
>> On 26/06/2007, at 5:39 PM, Andrus Adamchik wrote:
>>
>>> Hi Marcin,
>>>
>>>> 1) SelectQuery(Student.class) with page size 10 takes 30-50 seconds.
>>>> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from 
>>>> STUDENT") without paging takes 100 sec.
>>>> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from 
>>>> STUDENT") with page size 10 takes 5 sec.
>>>> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from 
>>>> STUDENT") with page size 10 and fetching row data rows takes 4.5 sec.
>>>
>>> I see you still didn't use profiler, but the data you provided seems 
>>> to finally confirm that at least on Derby not fetching all columns 
>>> does result in significant speedup (#1 vs. #3). So now it is a 
>>> question of implementing the right algorithm for the 
>>> IncrementalFaultList.
>>>
>>>> Andrus, you mentioned using addCustomDbAttribute to fetch only part 
>>>> of the data. I tried to use addCustomDbAttribute("id") on client, 
>>>> it resulted in returning the raw dataRows, is there something I can 
>>>> do to fetch faulted objects ?
>>>
>>> We should encapsulate this logic inside IncrementalFaultList on the 
>>> server.
>>>
>>>> Our application was designed to use the SelectQuery. If we have to 
>>>> change that and use the SQLTemplate instead, there is a lot of work 
>>>> for us, including:
>>>
>>> Same thing - the right thing to do is to fix it on the server.
>>>
>>> Let me try to find a spare minute later tonight and implement 
>>> id-only fetch. I have some large tables in a MySQL5 so I can test 
>>> the performance in a slightly different environment.
>>>
>>> Andrus
>>>
>>>
>>>
>>> On Jun 26, 2007, at 9:26 AM, Marcin Skladaniec wrote:
>>>> Hi
>>>>
>>>> I have done some more profiling and testing.
>>>>
>>>> executing queries on table with >100000 records, directly on server 
>>>> (not on client) gave results as listed below:
>>>>
>>>> 1) SelectQuery(Student.class) with page size 10 takes 30-50 seconds.
>>>> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from 
>>>> STUDENT") without paging takes 100 sec.
>>>> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from 
>>>> STUDENT") with page size 10 takes 5 sec.
>>>> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from 
>>>> STUDENT") with page size 10 and fetching row data rows takes 4.5 sec.
>>>>
>>>> what more, I found that executing the SQLTemplate does allow to 
>>>> fault the objects (I sometimes discover the simplest things last), 
>>>> so I did try to check how long it takes for the objects to be faulted:
>>>> 1) first object on every page (except first) 30-200ms, rest = 0ms
>>>> 2) objects is faulted in 20ms (average)
>>>> 3) on first page first object faulted in 200ms, rest  ~20 ms,
>>>>      on any following page first object faulted in 30-200ms, rest 
>>>> 0ms (interesting that the first page does not seem to be faulted at 
>>>> all)
>>>> 4) no point testing.
>>>>
>>>> Also I did check if the resizing of the ArrayList which is keeping 
>>>> the results does affect the speed, and it does not. (Tried to make 
>>>> the ArrayList initial size = 150,000).
>>>>
>>>> My conclusion is that SelectQuery with paging is usable only for 
>>>> fetching less than, say 10,000 records, otherwise the performance 
>>>> is to low. With SQLTemplate the performance is much greater. It 
>>>> applies to both ROP and 'normal' cayenne, since I made those tests 
>>>> on server.
>>>>
>>>> Andrus, you mentioned using addCustomDbAttribute to fetch only part 
>>>> of the data. I tried to use addCustomDbAttribute("id") on client, 
>>>> it resulted in returning the raw dataRows, is there something I can 
>>>> do to fetch faulted objects ?
>>>>
>>>> Our application was designed to use the SelectQuery. If we have to 
>>>> change that and use the SQLTemplate instead, there is a lot of work 
>>>> for us, including:
>>>> - dealing with adding and concatenating Expressions to the 
>>>> SQLTemplate (is there an easy way ?)
>>>> - dealing with declared qualifier (the one set in modeller)
>>>> - possibly more...
>>>> i would really like to avoid all of that, so if you have any ideas 
>>>> on how to improve the performance without too much hassle I would 
>>>> really appreciate.
>>>>
>>>> Marcin
>>>>
>>>> On 25/06/2007, at 8:31 PM, Marcin Skladaniec wrote:
>>>>
>>>>> Hi Andrus
>>>>> I had not much time to check, but with the fix the 100k records 
>>>>> load in 30 instead of 50 seconds. It is some improvement, but not 
>>>>> enough. I'll do some more profiling tomorrow and let you know.
>>>>>
>>>>> By the way,  we are using netbeans for profiling, the benefit : it 
>>>>> is free. I will evaluate the yourkit as we are moving away from 
>>>>> netbeans as a development platform.
>>>>>
>>>>> Marcin
>>>>>
>>>>> On 23/06/2007, at 5:38 PM, Andrus Adamchik wrote:
>>>>>
>>>>>> Ari, Marcin --
>>>>>>
>>>>>> going through the code I noticed one inefficiency - the elements

>>>>>> array access is synchronized in 'fillIn' method. Since 'fillIn' 
>>>>>> is called from constructor, such synchronization is unneeded and

>>>>>> only slows things down. I just checked a fixed version to trunk.

>>>>>> Could you try it out?
>>>>>>
>>>>>> Andrus
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Jun 23, 2007, at 12:33 AM, Aristedes Maniatis wrote:
>>>>>>> On 22/06/2007, at 11:10 PM, Michael Gentry wrote:
>>>>>>>
>>>>>>>> Marcin, this thread might be of interest to you ...
>>>>>>>>
>>>>>>>> http://mail-archives.apache.org/mod_mbox/cayenne-dev/200705.mbox/browser

>>>>>>>>
>>>>>>>>
>>>>>>>> Look at the "Paging and SQL queries" thread on May 25.
>>>>>>>
>>>>>>>
>>>>>>> Yes, this is the same project we are working on. I started some

>>>>>>> performance profiling and Marcin has been able now to take it

>>>>>>> much further. What is it about:
>>>>>>>
>>>>>>>> elements.add(it.nextObjectId(entity));
>>>>>>>
>>>>>>> which is so slow? The code gets a little complex at that point

>>>>>>> and we are having difficulty tracing it through to the exact

>>>>>>> performance problem in the underlying code. Is it the speed of

>>>>>>> adding the object id to the Collection or the speed of creating

>>>>>>> an object id itself? 0.5ms doesn't sound slow, but it doesn't

>>>>>>> scale well.
>>>>>>>
>>>>>>> Andrus, I got the impression from the previous thread that you

>>>>>>> suspected something slightly different. That the performance

>>>>>>> problem might be in the fat query itself, but from our tests

>>>>>>> this isn't the case. If I've got this right, the way it works
is:
>>>>>>>
>>>>>>> 1. perform regular query to get all columns but return result
in 
>>>>>>> iterator
>>>>>>> 2. iterate through first page and build full objects
>>>>>>> 3. iterate through other pages and build just objectids (this
is 
>>>>>>> the slow part for us)
>>>>>>> 4. when another page is fetched perform another query and fetch

>>>>>>> those objects from the DB
>>>>>>>
>>>>>>> So, getting just primary keys from the DB may not be any faster

>>>>>>> if the performance problem is simply in the construction of 
>>>>>>> objectIds. If the performance problem is in the numerous 
>>>>>>> resizings of the Collection (each time it runs out of space,

>>>>>>> then it is increased by 50% or 100% in size), then the solution

>>>>>>> could be as simple as figuring out the size of the iterator and

>>>>>>> sizing the collection appropriately from the beginning.
>>>>>>>
>>>>>>> Any ideas on how to discover the exact cause of the performance

>>>>>>> hit?
>>>>>>>
>>>>>>>
>>>>>>> Ari Maniatis
>>>>>>>
>>>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>>
>

-- 
bbLOG <http://borutb.tuditi.delo.si/>

Mime
View raw message