db-torque-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From YannickR <Yannick.Rich...@matricis.com>
Subject Re: LargeSelect example ?
Date Tue, 30 Oct 2007 13:46:17 GMT

Thank you Scott for your great explanation of LargeSelect.

I understand that the goal of it is to view a portion of the data at a time
: (PageSize * MemoryPageLimit) records. We already have a program that use
Torque objects for synchronization on a 10 second thread basis and map
objects from one side to the other side. I made some modifications to it so
we can launch it in batch mode, because too much data was accumulated on one
database side (around 185 000 records).

Now I need to read them all with LargeSelect. Like I said, it is not reading
all records as expected, unless I specified PageSize and MemoryPageLimit
values high enough to cover the total number of records. As you suggest, I
would use 435*435 (189 235) in order to read them all with success... This
would surely generate a Heap memory exception. What should I do if i do not
know exactly the volume of data to process ? Do I need to calculate it
dynamically according to total record count ?

With your example, if I set a PageSize of 100, MemoryPageLimit will equal 5
:
(100 * 5 < 250) ? (250 / 100 ) : 5
I will then miss most of the data, it will only read 1000 records on 185
000...

You said that it is possible to configure LargeSelect to pull in just one
page of data at a time.
How can I do that ? That would resolve my issue, since it would stop reading
all records in one shot while instantiating LargeSelect.

Thanks for your help ! It is appreciated.

Regards,
Yannick


seade wrote:
> 
> I hadn't been paying close attention to this thread, but it seems that 
> the a couple of points are somehow being missed:
> 
> 1. If you have a large amount of data, how much of it is the user 
> actually going to practically be able to view.  It is not such a good 
> idea to provide the user with a means of browsing through a million 
> records - they will never do so.  You need to provide the ability to 
> filter the data down to a practical number of records that the user can 
> then view.
> 2. If you run a query that pulls in one million records you are more 
> than likely going to run out of memory.  This is in fact the problem 
> that LargeSelect seeks to address.  Instead of pulling in all of the 
> records, in instead pulls in a subset of these that can then be 
> presented a page at a time.  While you can configure LargeSelect to pull 
> in just one page of data at a time this may be at odds with the 
> complexity of the query and the amount of time it takes to execute.  To 
> counter this, LargeSelect provides the ability to cache a configurable 
> number of pages worth of data - this way the user can at least browse 
> through a few pages of data without triggering an expensive query for 
> every hit.  It is up to you to determine how much data will be presented 
> on any given page and how many pages of data to read ahead - make the 
> values too large and you will still run out of memory.
> 
> I am a heavy user of LargeSelect.  I use a pageSize of between 10 and 
> 100 (as selected by the user) and a memoryPageLimit of:
> 
> 	(pageSize * 5 < 250) ? (250 / pageSize ) : 5
> 
> And everything works nicely.
> 
> I have no idea whether or not Torque-84 works, but it is unlikely to be 
> committed without the addition of test cases and even then it will 
> require a committer to take the time to ensure it behaves correctly. 
>  From what Greg is saying, for MS SQL Torque-84 should not be required, 
> but other changes at svn trunk are.
> 
> LargeSelect is about presenting data to users - as I said above, a user 
> is never going to look at 1 million records.  You on the other hand are 
> working on database synchronization, so I assume you are working through 
> a large number of records (8000 was mentioned somewhere) that are not 
> actually being presented to users.  The first question I would ask is 
> whether or not you need to instantiate the data as Torque objects - i.e. 
> could you get by with using native SQL (most likely quicker when dealing 
> with bulk data like this).  That said, there should be no reason why you 
> cannot use LargeSelect for your purposes - i.e. to limit the number of 
> records in memory at any given time.  To do this I would set pageSize 
> and memoryPageLimit to the same value, a value that maximises throughput 
> by balancing the trade-off between memory use and query execution time.
> 
> HTH,
> 
> Scott
> 
> YannickR wrote:
>> I checked out the current CVS head (without Torque-84 patch) and did some
>> tests in order to better explain what is happening. It seems that
>> PageSize *
>> PageMemoryLimit need to cover the amount of records. For example if you
>> have
>> 8000 records to read and PageMemoryLimit is set to the default of 5,
>> PageSize would have a minimum value of 8000/5/2 = 800. If a value lower
>> than
>> 800 is used, some records won't be read... When you have a huge amount as
>> 185 000 records to read, the limit will be memory : 185 000/5/2 = 18 500
>> minimum. That means 92 500 records in memory at one time...
>> 
>> To reproduce the situation, LargeSelect unit tests should not use
>> PageSize
>> and PageMemoryLimit in order to fill Authors. By doing so, all records
>> are
>> covered and the comportment that I just described won't be reproducible.
>> Anyway, is 9*9 records a "Large" Select test ?
>> 
>> As I already said, when I use Torque-84 patch,
>> LargeSelect.getNextResultsAvailable() will always return true, so reading
>> in
>> an infinite loop ;-(
>> 
>> Could someone clarify, please ?
>> 
>> 
>> Greg Monroe wrote:
>>> As a quick aside, it would be much easier to follow your 
>>> messages, if your embedded comments where not prefixed
>>> with one or more >'s.  Makes it real hard to see what 
>>> are new comments and what are old.
>>>
>>> That said, I tested the current CVS head (which is 99.9% 
>>> final release for 3.3) against MS SQL 2000 just last 
>>> week.  In order for this to pass all the Limit / 
>>> LargeSelect tests in the test project, I committed some 
>>> changes to the DBSybase class (which MS SQL extends).
>>>
>>> So, try checking out the latest from CVS and using this. 
>>> This should work with MS 2005.  The support is generic 
>>> across all MS SQL versions, so it is "psuedo" support that
>>> requires more data than requested to be read and "trimmed"
>>> down.
>>>
>>>
>>>> -----Original Message-----
>>>> From: YannickR [mailto:Yannick.Richard@matricis.com] 
>>>> Sent: Friday, October 26, 2007 12:26 PM
>>>> To: torque-user@db.apache.org
>>>> Subject: Re: LargeSelect example ?
>>>>
>>>>
>>>>> Is the patch working or not ? The status on 
>>>>>
>>>> https://issues.apache.org/jira/browse/TORQUE-84?page=com.atlassian.jir
>>>>> a.plugin.system.issuetabpanels:all-tabpanel
>>>>> seems to be unresolved...
>>>>>
>>>>> Could someone help me on that one ?
>>>>> Can I still use LargeSelect with MSSQL 2005 ?
>>>>>
>>>>> Regards,
>>>>> Yannick Richard
>>>>>
>>> DukeCE Privacy Statement:
>>> Please be advised that this e-mail and any files transmitted with
>>> it are confidential communication or may otherwise be privileged or
>>> confidential and are intended solely for the individual or entity
>>> to whom they are addressed. If you are not the intended recipient
>>> you may not rely on the contents of this email or any attachments,
>>> and we ask that you please not read, copy or retransmit this
>>> communication, but reply to the sender and destroy the email, its
>>> contents, and all copies thereof immediately. Any unauthorized
>>> dissemination, distribution or copying of this communication is
>>> strictly prohibited.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: torque-user-unsubscribe@db.apache.org
>>> For additional commands, e-mail: torque-user-help@db.apache.org
>>>
>>>
>>>
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: torque-user-unsubscribe@db.apache.org
> For additional commands, e-mail: torque-user-help@db.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/LargeSelect-example---tf4605414.html#a13488717
Sent from the Apache DB - Torque Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: torque-user-unsubscribe@db.apache.org
For additional commands, e-mail: torque-user-help@db.apache.org


Mime
View raw message