db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Newsham" <jnews...@referentia.com>
Subject RE: excessive disk space allocation
Date Thu, 23 Oct 2008 20:59:03 GMT


> -----Original Message-----
> From: Kristian.Waagan@Sun.COM [mailto:Kristian.Waagan@Sun.COM]
> Sent: Thursday, October 23, 2008 5:26 AM
> To: Derby Discussion
> Subject: Re: excessive disk space allocation
> 
> On 23.10.08 02:26, Jim Newsham wrote:
> > Hasn't been a lot of response to this thread.  I have a 23gb database
> > holding only 500mb of data, all created with just inserts (no deletes).
> For
> > our app, this is a serious problem.
> 
> Hi Jim,
> 
> I would say this is a serious problem for any app...
> Do you have a way to reproduce this behavior?
> If so, it would be very helpful if you could attach the repro to a
> Jira[1] issue.

The behavior is exhibited in our app, which is unfortunately rather complex.
I tried writing a stand-alone test case which behaves similarly to our app,
but haven't been able to reproduce the problem yet.  The problem only
becomes apparent with a sufficiently large database, which takes hours to
generate... so trying to reproduce the problem is a bit time-consuming.
I'll try again later, but time is short for now, so I've got to move on to
"mitigation".

It's also hard to know which factors contribute to the problem, in creating
the test case.  I have numerous threads; each thread inserts into 2 common
tables, as well as 4 private tables.  Auto-commit is disabled, and
transactions are largish (on the order of 100 or perhaps 1000).  Each thread
inserts every 30 seconds, so I'm not really hammering the database.

> 
> 
> >
> > Someone suggested the problem is caused by multi-threaded inserts, but
> the
> > tables which exhibit the problem were only inserted into by a single
> thread,
> > each.
> >
> > Any suggestions?
> >
> > Is there a way to tell, before compacting, how much space would be saved
> by
> > compacting a table?  With this information, at least I would be able to
> > periodically compact just those tables which merit being compacted, as a
> > workaround to the real problem.
> 
> I don't remember the details, but if you have a look at the source for
> "DerbyDiskSpaceDiag.java" attached to DERBY-2549[2] you should be able
> to figure out the SQL required. I'm sure it's in the manual somewhere too.
> You can also just compile and run the program (preferably on a copy /
> non-production db) and see what it says. With 23 GB of data, I think it
> will chew for a while before you see any output.

Yeah I've already found that "script".   It calls the SYSCS_DIAG.SPACE_TABLE
table function described here:

http://db.apache.org/derby/docs/10.4/ref/rrefsyscsdiagtables.html

Unfortunately this table function doesn't really provide enough information
to know in advance, how much wasted space would be freed by compacting a
table.  The NUMUNFILLEDPAGES and ESTIMSPACESAVING fields sound promising,
but the first field is always quite low (apparently derby thinks the pages
are "filled" when in fact they're very sparse), and the second field is
always zero.  

Here's an example (connection0 is the database before compaction;
connection1 is the database after compaction).  Notice how much
NUMALLOCATEDPAGES decreases:

ij(CONNECTION1)> set connection connection0;

0 rows selected
ij(CONNECTION0)> select * from table (syscs_diag.space_table('APP',
'ROUTER_FHK0908F44U_0_SAMPLE')) as t;
CONGLOMERATENAME
|ISIND&|NUMALLOCATEDPAGES   |NUMFREEPAGES        |NUMUNFILLEDPAGES
|PAGESIZE   |ESTIMSPACESAVING
----------------------------------------------------------------------------
----------------------------------------------------------------------------
-------------------------------
ROUTER_FHK0908F44U_0_SAMPLE
|0     |206                 |0                   |1                   |32768
|0
SQL081017094932450
|1     |22                  |0                   |0                   |4096
|0
SQL081017094932451
|1     |13                  |0                   |0                   |4096
|0

3 rows selected
ij(CONNECTION0)> SET CONNECTION CONNECTION1;
ij(CONNECTION1)> select * from table (syscs_diag.space_table('APP',
'ROUTER_FHK0908F44U_0_SAMPLE')) as t;
CONGLOMERATENAME
|ISIND&|NUMALLOCATEDPAGES   |NUMFREEPAGES        |NUMUNFILLEDPAGES
|PAGESIZE   |ESTIMSPACESAVING
----------------------------------------------------------------------------
----------------------------------------------------------------------------
-------------------------------
ROUTER_FHK0908F44U_0_SAMPLE
|0     |3                   |0                   |1                   |32768
|0
SQL081017094932450
|1     |12                  |0                   |0                   |4096
|0
SQL081017094932451
|1     |8                   |0                   |0                   |4096
|0
3 rows selected

I guess I can try to estimate based on the average record size, but that'll
probably be a little messy and rough.

Regards,
Jim

> 
> 
> 
> --
> Kristian
> 
> [1] https://issues.apache.org/jira/browse/DERBY
> [2] https://issues.apache.org/jira/browse/DERBY-2549
> 
> 
> >
> > Thanks,
> > Jim
> >
> >> -----Original Message-----
> >> From: Jim Newsham [mailto:jnewsham@referentia.com]
> >> Sent: Tuesday, October 21, 2008 11:21 AM
> >> To: 'Derby Discussion'
> >> Subject: RE: excessive disk space allocation
> >>
> >>
> >>
> >>> -----Original Message-----
> >>> From: Knut.Hatlen@Sun.COM [mailto:Knut.Hatlen@Sun.COM]
> >>> Sent: Monday, October 20, 2008 9:27 PM
> >>> To: Derby Discussion
> >>> Subject: Re: excessive disk space allocation
> >>>
> >>> Jim Newsham <jnewsham@referentia.com> writes:
> >>>
> >>>> Hi,
> >>>>
> >>>> I'm doing some benchmarking of our application which stores data in
> >>> derby.
> >>>> The parts of the application which I am exercising only perform
> >> inserts,
> >>> not
> >>>> deletes.  The results suggest that derby disk space allocation is
> >>> excessive,
> >>>> particularly because compressing the tables reduces the size of the
> >>> database *
> >>>> substantially*.  For example, here are the results of several
> >> databases,
> >>> both
> >>>> before and after compression.
> >>>>
> >>>> Application running time.  original -> compressed
> >>>>
> >>>> 0.5 days.  178.2mb -> 63.1mb
> >>>>
> >>>> 1 day.  559.3mb -> 82.8mb
> >>>>
> >>>> 2 days.  1,879.1mb -> 120.8mb
> >>>>
> >>>> 4 days.  5,154.4mb -> 190.5mb
> >>>>
> >>>> 8 days. 11,443.7mb -> 291.6mb
> >>>>
> >>>> 16 days.  23,706.7mb -> 519.3mb
> >>>>
> >>>> Plotting the data, I observe that both uncompressed and compressed
> >> sizes
> >>>> appear to grow linearly, but the growth factor (slope of the linear
> >>> equation)
> >>>> is 53 times as large for the uncompressed database.  Needless to say.
> >>> this is
> >>>> huge.
> >>>>
> >>>> I expected that with only inserts and no deletes, there should be
> >> little
> >>> or no
> >>>> wasted space (and no need for table compression).  Is this assumption
> >>>> incorrect?
> >>> Hi Jim,
> >>>
> >>> You may have come across a known issue with multi-threaded inserts to
> >>> the same table:
> >>>
> >>> http://thread.gmane.org/gmane.comp.apache.db.derby.devel/36430
> >>> https://issues.apache.org/jira/browse/DERBY-2337
> >>> https://issues.apache.org/jira/browse/DERBY-2338
> >> Thanks for those links.  I used the diagnostic dump program from the
> >> mentioned discussion thread to see how much the individual tables in my
> >> database are compacting.
> >>
> >> The "multi-threaded inserts to the same table" theory doesn't quite
> jive
> >> here.  In my case, I have multiple threads inserting into the database,
> >> but
> >> most of the data goes into tables which are only inserted into by a
> single
> >> thread for the duration of the application.
> >>
> >> There are only two tables inserted into by more than one thread, and
> the
> >> data they contain is relatively small (a few percent).  For a test
> >> database
> >> I'm looking at right now, these two tables compress to 50% and 90% of
> >> original size, respectively... not much at all.
> >>
> >> By contrast, I am seeing most of the other tables (which aren't
> inserted
> >> into by more than one thread) compress to between 0.5% and 3.8% of
> >> original
> >> size.  For example, I see one table go from 783 pages to 4 pages.
> >>
> >> Jim
> >>
> >>
> >>
> >
> >
> >
> 




Mime
View raw message