db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kristian Waagan <Kristian.Waa...@Sun.COM>
Subject Re: Increased disk space allocation when populating database with multiple connections
Date Mon, 12 Feb 2007 12:23:13 GMT
Suresh Thalamati wrote:
> Kristian Waagan wrote:
>> Hello,
>>
>> For a database population program I run, I have observed that the disk 
>> space allocation is larger when populating the database with multiple 
>> concurrent connections.
>> For a specific configuration, the database ends up at 642 MB with a 
>> single connection, whereas it ends up at 1.3 GB when using multiple 
>> connections. The raw data volume is at about 215 MB, there are 20 
>> tables and between 20 and 30 indexes (didn't take the time to figure 
>> out which indexes are composite, 'show indexes' doesn't give this 
>> information but returns 31 rows).
>>
>> Is this to be expected?
> 
> 
> No. The Difference in size seems to be too high with just inserts. Are 
> you doing inserts into the same table in parallel threads ?. Is it one 
> particular tables/index that is becoming too big or all the 
> tables/indexes ?

Thanks Suresh and Bryan for your replies.

You both ask similar questions, so here's a common reply.

The "big" database is the original one, created with 25 insert threads.
The "small" database is the original one after running compress on it.
I also tried inserting with only one thread.

Database size on disk, allocated space:
25 threads			1282 MB
25 threads after compress	 642 MB    
1 thread			 643 MB

As can be seen, running compress on the original database and inserting 
with only one thread gave almost identical results.

The database in question has a total of 103 conglomerates, including 
system tables. Out of these, 34 has changed size.
The increase/decrease from the compressed database to the original one 
varied between -50% to 196%. If we ignore conglomerates smaller then 10 
MB, we have variation from -8%/+2 to 196%, but there is only one 
conglomerate where the size decreased.
I have attached the sizes below.

So the answer to your question is that some tables become to big, some 
do not. I am not able to see a pattern based on the current data, but 
maybe you are?


I have also attached the program I used to output space diag 
information. It is far from perfect, but it does get the information out.



thanks,
-- 
Kristian

Second and third column are size of conglomerate files on disk in KB.
Increase/decrease in number of allocated pages is the same as the file size.

Type			Compres Orig	Change
User table		115404	217848	 89%
User index (backing)	8324	16036	 93%
User table		220664	653696	196%
User index (backing)	63848	115148	 80%
User index		63848	115500	 81%
User table		12868	14180	 10%
User index (backing)	3972	3204	-19%
User table		10180	10660	  5%
User index (backing)	6084	6628	  9%
User table		5924	6148	  4%
User index (backing)	3972	3364	-15%
User table		11908	12612	  6%
User index (backing)	3972	3396	-15%
User table		580	676	 17%
User index (backing)	132	196	 48%
User table		868	2148	147%
User index (backing)	228	324	 42%
User index		132	196	 48%
User table		11108	11300	  2%
User index (backing)	3108	5892	 90%
User table		3204	3236	  1%
User index (backing)	836	1444	 73%
System table		68	36	-47%
System index		16	20	 25%
System index		16	8	-50%
System index		16	8	-50%
System table		16	12	-25%
System table		328	296	-10%
User table		10052	11652	 16%
User index (backing)	3844	3076	-20%
User table		836	868	  4%
User index (backing)	164	292	 78%
User table		42344	47624	 12%
User index (backing)	45992	42184	- 8%
Total			656760	1311812	100%

Note that for the ones with negative change, the conglomerates grew when 
running compress.


Program used to extract space diagnostics information.
Invoke like this: java -classpath .:path-to-derby-stuff 
DerbyDiskSpaceDiag /tmp/myDatabase [myschema]


> 
> 
> Thanks
> -suresh
> 
> 


Mime
View raw message