db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul J DeCoursey <p...@decoursey.net>
Subject Re: maximum file size
Date Fri, 10 Nov 2006 17:19:01 GMT
Michael Segel wrote:
> Let me clarify.
> My suggestion was that
> You are essentially creating a poor man's mpp and the local temp table is on a "control"
node.  The temp table holds the result set returned from each query. This assumes that while
the table may exceed the 4 gig limit, a query should not. True if this assumption is wrong,
then clearly derby is not the right tool in the first place.
> Using. Multiple copies of the same table doesn' really solve the problem.

> What you end up having is a set of tabls foo_n where n is the table number.  Then to
do your query, you would have to create a view foo_v that you query from to hide the detail
from the app and user.  This would be an implementation of a poor man's table partitioning.
> It could be done , however you now have to write a pre-insert trigger to determine which
table to store the data.  Then what happens when you overload the table? ( Suppose you partition
on a time period and the amount of data exceeds 4gb?)
> ( Talk about a set of complex stored procedures)
> Yes it could be done using derby today.
> So too could you do a poor man's mpp version as well. ( Although it would be a bit more
work on the controlling app's side.)
> The point of my post is that while these solutions are possible, it would be better to
use a database more suited to the task, or actually enhance derby to improve their storage
options. However this would mean that you would have to rethink derby, and the design may
change radically enough you would end up with a new product aka "stormcloud".
> Note: while this isn't rocket science it would require a concerted effort that goes beyond
what you could do under Apache. 
> Hint: in theory, Sun could start with Derby and create their own database aka "stormcloud"
and then run with it.
> (Look at Apache's license.)
> Actually anyone could do this, but it would take some deep pockets to support a multiple
man year effort.
> the original poster is trying to use a yugo to pull a loaded semi up a steep hill.
> Sent via BlackBerry.
> -Mike Segel
> Principal
> 312 952 8175
> -----Original Message-----
> From: dmclean62@comcast.net
> Date: Fri, 10 Nov 2006 16:16:55 
> To:"Derby Discussion" <derby-user@db.apache.org>, msegel@segel.com
> Subject: Re: maximum file size
> If the OS won't allow a large enough DB file then a local temp file won't work either.
> I am  curious about one thing though: Why would it not be a good idea to use multiple
tables? Other than the fact that it would be a practical solution that could actually be implemented
with the current version of Derby that is.
> It doesn't seem like it would be very helpful to dismiss a possible solution out of hand
without at least some sort of vague and reasonable explanation.
> Donald
>  -------------- Original message ----------------------
> From: "Michael Segel" <msegel@mycingular.blackberry.net>
>> Not a good idea to use multiple files.
>> Why not go all the way and make derby in to an mpp db? All you would have to do 
>> is to preprocess the inbound query then send it off to all of the nodes, taking 
>> the result set(s) in to a local temp file and then post process and return the 
>> results...  :-)
>> -----Original Message-----
>> From: dmclean62@comcast.net
>> Date: Fri, 10 Nov 2006 14:12:35 
>> To:"Derby Discussion" <derby-user@db.apache.org>
>> Subject: Re: maximum file size
>> You could use multiple tables to get around the file size limit.
>> Decide how many rows would go in each table and then use some mechanism for 
>> assigning a unique ID to each row. You would then be able to determine which 
>> table a specific row is in with an integer division.
>> table # = <global row #> / <rows per table>
>> local row # = <global row #> % <rows per table>
>> The Telemetry Data Warehouse for the Hubble Space Telescope divides the data up 
>> by time - each data table contains one calendar year's worth of telemetry data.
>> Just a couple of ideas.
>>  -------------- Original message ----------------------
>> From: Suresh Thalamati <suresh.thalamati@gmail.com>
>>> redcloud wrote:
>>>> Hi! I need to build a SQL table containing 1000000000 (!!!) rows. But i 
>>>> filled up a table with 20000000 rows (file size of table 4GB) and my 
>>>> filesystem denied to go on filling up the table. My question is: can 
>>>> derby db build "infinitive" size table by chunking in multiple files?
>>> No. Currently a table maps to a single file in Derby. Table size is 
>>> limited by the size of the file that can be created on a file system.
I agree with Michael on this one. It's not a good idea of duplicate 
table structures.  I think the original poster needs to analyze the 
data, can it be split out other ways.  Is there any particular columns 
that take up more space than others can reside in seperate tables.  Is 
there any columns that contain duplicate data that can be normalized in 
other ways.  But looking at the figures it appears that each row is 
occupying less that 250 bytes(if my math is right) which leads us back 
to the question, "Is derby the right solution?", I don't think it is.

On a side note, I am very curious about what the data is.  Maybe a 
database is just not the right place for it altogether.


View raw message