db-torque-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Eade <se...@backstagetech.com.au>
Subject Re: LargeSelect example ?
Date Tue, 30 Oct 2007 21:56:06 GMT
LargeSelect exists specifically so that you can read through a large 
number of records in chunks of size memoryPageLimit.

If you want to read all of the records into memory at once you would not 
use LargeSelect.  If all the records will not fit into memory then you 
use LargeSelect to automatically break up the query result using 
offset/limit queries (which is not supported by Torque for all RDBMSs).

If you are presenting data to users you use pageSize to determine the 
number of records at a time that you want to pull from those available 
(memoryPageLimit).  If you are not presenting the data to a user you 
should set pageSize to the same value as memoryPageLimit.

LargeSelect is essentially a buffering mechanism:
- memoryPageLimit is a buffer of records from the overall query result - 
LargeSelect pulls in more data using offset and limit whenever data 
beyond what it has in memory is requested.
- pageSize determines how much of the buffered data you want to access 
in each request.
When data within the memoryPageBuffer is requested it can be provided 
without executing a further query.  If data outside of the 
memoryPageBuffer is requested the existing data is discarded and a new 
one that contains the requested data is filled by querying for a subset 
for the full query results.

So to run through your 185K records you simply keep retrieving them a 
page at a time using getNextResults() with a memoryPageLimit well below 
185K.  If this is not working for you then I would suggest that the 
offset/limit processing for the RDBMS you are using may not be fully 
implemented, may have a bug, or may be falling back to processing that 
actually retrieves all of the records and discards those outside of the 
offset/limit (this would totally defeat the purpose of using LargeSelect).

Have you run the LargeSelect tests against your RDBMS?  You should do so 
and ensure that the offset/limit is fully implemented.


YannickR wrote:
> Thank you Scott for your great explanation of LargeSelect.
> I understand that the goal of it is to view a portion of the data at a time
> : (PageSize * MemoryPageLimit) records. We already have a program that use
> Torque objects for synchronization on a 10 second thread basis and map
> objects from one side to the other side. I made some modifications to it so
> we can launch it in batch mode, because too much data was accumulated on one
> database side (around 185 000 records).
> Now I need to read them all with LargeSelect. Like I said, it is not reading
> all records as expected, unless I specified PageSize and MemoryPageLimit
> values high enough to cover the total number of records. As you suggest, I
> would use 435*435 (189 235) in order to read them all with success... This
> would surely generate a Heap memory exception. What should I do if i do not
> know exactly the volume of data to process ? Do I need to calculate it
> dynamically according to total record count ?
> With your example, if I set a PageSize of 100, MemoryPageLimit will equal 5
> :
> (100 * 5 < 250) ? (250 / 100 ) : 5
> I will then miss most of the data, it will only read 1000 records on 185
> 000...
> You said that it is possible to configure LargeSelect to pull in just one
> page of data at a time.
> How can I do that ? That would resolve my issue, since it would stop reading
> all records in one shot while instantiating LargeSelect.
> Thanks for your help ! It is appreciated.
> Regards,
> Yannick
> seade wrote:
>> I hadn't been paying close attention to this thread, but it seems that 
>> the a couple of points are somehow being missed:
>> 1. If you have a large amount of data, how much of it is the user 
>> actually going to practically be able to view.  It is not such a good 
>> idea to provide the user with a means of browsing through a million 
>> records - they will never do so.  You need to provide the ability to 
>> filter the data down to a practical number of records that the user can 
>> then view.
>> 2. If you run a query that pulls in one million records you are more 
>> than likely going to run out of memory.  This is in fact the problem 
>> that LargeSelect seeks to address.  Instead of pulling in all of the 
>> records, in instead pulls in a subset of these that can then be 
>> presented a page at a time.  While you can configure LargeSelect to pull 
>> in just one page of data at a time this may be at odds with the 
>> complexity of the query and the amount of time it takes to execute.  To 
>> counter this, LargeSelect provides the ability to cache a configurable 
>> number of pages worth of data - this way the user can at least browse 
>> through a few pages of data without triggering an expensive query for 
>> every hit.  It is up to you to determine how much data will be presented 
>> on any given page and how many pages of data to read ahead - make the 
>> values too large and you will still run out of memory.
>> I am a heavy user of LargeSelect.  I use a pageSize of between 10 and 
>> 100 (as selected by the user) and a memoryPageLimit of:
>> 	(pageSize * 5 < 250) ? (250 / pageSize ) : 5
>> And everything works nicely.
>> I have no idea whether or not Torque-84 works, but it is unlikely to be 
>> committed without the addition of test cases and even then it will 
>> require a committer to take the time to ensure it behaves correctly. 
>>  From what Greg is saying, for MS SQL Torque-84 should not be required, 
>> but other changes at svn trunk are.
>> LargeSelect is about presenting data to users - as I said above, a user 
>> is never going to look at 1 million records.  You on the other hand are 
>> working on database synchronization, so I assume you are working through 
>> a large number of records (8000 was mentioned somewhere) that are not 
>> actually being presented to users.  The first question I would ask is 
>> whether or not you need to instantiate the data as Torque objects - i.e. 
>> could you get by with using native SQL (most likely quicker when dealing 
>> with bulk data like this).  That said, there should be no reason why you 
>> cannot use LargeSelect for your purposes - i.e. to limit the number of 
>> records in memory at any given time.  To do this I would set pageSize 
>> and memoryPageLimit to the same value, a value that maximises throughput 
>> by balancing the trade-off between memory use and query execution time.
>> HTH,
>> Scott
>> YannickR wrote:
>>> I checked out the current CVS head (without Torque-84 patch) and did some
>>> tests in order to better explain what is happening. It seems that
>>> PageSize *
>>> PageMemoryLimit need to cover the amount of records. For example if you
>>> have
>>> 8000 records to read and PageMemoryLimit is set to the default of 5,
>>> PageSize would have a minimum value of 8000/5/2 = 800. If a value lower
>>> than
>>> 800 is used, some records won't be read... When you have a huge amount as
>>> 185 000 records to read, the limit will be memory : 185 000/5/2 = 18 500
>>> minimum. That means 92 500 records in memory at one time...
>>> To reproduce the situation, LargeSelect unit tests should not use
>>> PageSize
>>> and PageMemoryLimit in order to fill Authors. By doing so, all records
>>> are
>>> covered and the comportment that I just described won't be reproducible.
>>> Anyway, is 9*9 records a "Large" Select test ?
>>> As I already said, when I use Torque-84 patch,
>>> LargeSelect.getNextResultsAvailable() will always return true, so reading
>>> in
>>> an infinite loop ;-(
>>> Could someone clarify, please ?
>>> Greg Monroe wrote:
>>>> As a quick aside, it would be much easier to follow your 
>>>> messages, if your embedded comments where not prefixed
>>>> with one or more >'s.  Makes it real hard to see what 
>>>> are new comments and what are old.
>>>> That said, I tested the current CVS head (which is 99.9% 
>>>> final release for 3.3) against MS SQL 2000 just last 
>>>> week.  In order for this to pass all the Limit / 
>>>> LargeSelect tests in the test project, I committed some 
>>>> changes to the DBSybase class (which MS SQL extends).
>>>> So, try checking out the latest from CVS and using this. 
>>>> This should work with MS 2005.  The support is generic 
>>>> across all MS SQL versions, so it is "psuedo" support that
>>>> requires more data than requested to be read and "trimmed"
>>>> down.
>>>>> -----Original Message-----
>>>>> From: YannickR [mailto:Yannick.Richard@matricis.com] 
>>>>> Sent: Friday, October 26, 2007 12:26 PM
>>>>> To: torque-user@db.apache.org
>>>>> Subject: Re: LargeSelect example ?
>>>>>> Is the patch working or not ? The status on 
>>>>> https://issues.apache.org/jira/browse/TORQUE-84?page=com.atlassian.jir
>>>>>> a.plugin.system.issuetabpanels:all-tabpanel
>>>>>> seems to be unresolved...
>>>>>> Could someone help me on that one ?
>>>>>> Can I still use LargeSelect with MSSQL 2005 ?
>>>>>> Regards,
>>>>>> Yannick Richard
>>>> DukeCE Privacy Statement:
>>>> Please be advised that this e-mail and any files transmitted with
>>>> it are confidential communication or may otherwise be privileged or
>>>> confidential and are intended solely for the individual or entity
>>>> to whom they are addressed. If you are not the intended recipient
>>>> you may not rely on the contents of this email or any attachments,
>>>> and we ask that you please not read, copy or retransmit this
>>>> communication, but reply to the sender and destroy the email, its
>>>> contents, and all copies thereof immediately. Any unauthorized
>>>> dissemination, distribution or copying of this communication is
>>>> strictly prohibited.
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: torque-user-unsubscribe@db.apache.org
>>>> For additional commands, e-mail: torque-user-help@db.apache.org
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: torque-user-unsubscribe@db.apache.org
>> For additional commands, e-mail: torque-user-help@db.apache.org

To unsubscribe, e-mail: torque-user-unsubscribe@db.apache.org
For additional commands, e-mail: torque-user-help@db.apache.org

View raw message