db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: Increased disk space allocation when populating database with multiple connections
Date Thu, 15 Feb 2007 00:07:03 GMT
I think you are seeing the effect of two problems with derby space
management (maybe one or both).  I've logged DERBY-2337 and DERBY-2338
to start a discussion on these.  I am just guessing what is going
on.:
1) hot single point of contention, with limited attention to multi-cpu 
optimization.
    o Derby keeps track of the single "last" page it inserted to, and
      then tends to insert the next row on that page if possible.
2) limited tracking of non-full pages.
    o Derby currently only allocates a single bit of information about
      how full a page is.  Basically it sets the bit if it is "unfilled".
      If this bit is turned off then we won't consider the page for
      inserts in the future.

Derby at insert time trys to keep rows on one page rather that split
across pages, and tries to fill up pages one at a time rather than
multi-thread inserts across multiple pages.  So at insert time into
a base table the store is presented with a stream of bytes of unknown
length, it tries the following:
1) First try inserting into last page inserted (single page nubmer 
tracked for #1 above).
2) If that fails then try ONE unfilled page, if that succeeds it becomes
the newest "last page".
3) if all else fails then allocate a brand new page and do the insert
    to it, and if it has to overflow just go ahead.  This then becomes
    the "last page".

One guess may be that in multi-user many threads may be failing at step
1 and then each go on to get a different page in step 2 and 3 and in the
process leave really unfilled pages behind.

In your db it would be interesting to see if the mostly empty pages are 
marked "unfilled" or not.  Depending on the answer the problem may be
that we aren't marking unfilled correctly, or we could do a better job
of finding unfilled pages.



Kristian Waagan wrote:
> Suresh Thalamati wrote:
> 
>> Kristian Waagan wrote:
>>
>>> Hello,
>>>
>>> For a database population program I run, I have observed that the 
>>> disk space allocation is larger when populating the database with 
>>> multiple concurrent connections.
>>> For a specific configuration, the database ends up at 642 MB with a 
>>> single connection, whereas it ends up at 1.3 GB when using multiple 
>>> connections. The raw data volume is at about 215 MB, there are 20 
>>> tables and between 20 and 30 indexes (didn't take the time to figure 
>>> out which indexes are composite, 'show indexes' doesn't give this 
>>> information but returns 31 rows).
>>>
>>> Is this to be expected?
>>
>>
>>
>> No. The Difference in size seems to be too high with just inserts. Are 
>> you doing inserts into the same table in parallel threads ?. Is it one 
>> particular tables/index that is becoming too big or all the 
>> tables/indexes ?
> 
> 
> Thanks Suresh and Bryan for your replies.
> 
> You both ask similar questions, so here's a common reply.
> 
> The "big" database is the original one, created with 25 insert threads.
> The "small" database is the original one after running compress on it.
> I also tried inserting with only one thread.
> 
> Database size on disk, allocated space:
> 25 threads            1282 MB
> 25 threads after compress     642 MB    1 thread             643 MB
> 
> As can be seen, running compress on the original database and inserting 
> with only one thread gave almost identical results.
> 
> The database in question has a total of 103 conglomerates, including 
> system tables. Out of these, 34 has changed size.
> The increase/decrease from the compressed database to the original one 
> varied between -50% to 196%. If we ignore conglomerates smaller then 10 
> MB, we have variation from -8%/+2 to 196%, but there is only one 
> conglomerate where the size decreased.
> I have attached the sizes below.
> 
> So the answer to your question is that some tables become to big, some 
> do not. I am not able to see a pattern based on the current data, but 
> maybe you are?
> 
> 
> I have also attached the program I used to output space diag 
> information. It is far from perfect, but it does get the information out.
> 
> 
> 
> thanks,
> 
> 
> ------------------------------------------------------------------------
> 
> import java.io.File;
> import java.sql.*;
> import java.util.Arrays;
> 
> /**
>  * Extract disk space diagnostics for a Derby database.
>  * This program *WILL NOT WORK* for other databases!
>  */
> public class DerbyDiskSpaceDiag {
> 
>     /** Newline character(s). */
>     private static final String SEP = System.getProperty("line.separator");
> 
>     /** Name of the diagnostics table. */
>     private static final String DIAGTABLE = 
>         "new org.apache.derby.diag.SpaceTable";
> 
>     /**
>      * Pads string with spaces on the left side.
>      */
>     private static String padLeft(String text, int length) {
>         StringBuffer str = new StringBuffer(text);
>         str.ensureCapacity(length);
>         while (str.length() < length) {
>             str.insert(0, ' ');
>         }
>         return str.toString();
>     }
> 
>     /**
>      * Pads string with spaces on the right side.
>      */
>     private static String padRight(String text, int length) {
>         StringBuffer str = new StringBuffer(text);
>         str.ensureCapacity(length);
>         while (str.length() < length) {
>             str.append(' ');
>         }
>         return str.toString();
>     }
> 
>     /**
>      * Load Derby driver.
>      * Will load the client driver if the database name starts with '//',
>      * the embedded driver otherwise.
>      */
>     private static void loadDriver(String dbName) {
>         String driver = "org.apache.derby.jdbc.EmbeddedDriver";
>         if (dbName.startsWith("//")) {
>             driver = "org.apache.derby.jdbc.ClientDriver";
>         }
>         try {
>             Class.forName(driver);
>         } catch (ClassNotFoundException cnfe) {
>             cnfe.printStackTrace();
>             System.err.println("!! Unable to load driver class for Derby: " +
>                     driver + SEP + "!! Please verify your classpath.");
>             System.exit(1);
>         }
>     }
> 
>     /**
>      * Print usage message/help.
>      */
>     private static void usage() {
>         System.out.println("Print disk usage statistics for a Derby database.");
>         System.out.println(SEP + "Usage:");
>         System.out.println("\tDerbyDiskSpaceDiag DBNAME [SCHEMA]");
>         System.out.println();
>     }
> 
>     /** SQL used to extract conglomerate number from conglomerate name. */
>     private static final String CONGLOM_NUMBER =
>         "SELECT CONGLOMERATENUMBER FROM SYS.SYSCONGLOMERATES c " +
>         "WHERE c.CONGLOMERATENAME = ?";
> 
>     /** SQL used to extract table id from table name. */
>     private static final String TABLEID =
>         "SELECT TABLEID FROM SYS.SYSTABLES t " +
>         "WHERE t.TABLENAME = ?";
> 
>     private static final String HORIZONTAL_LINE =
>         "- - - - - - - - - - - - - - - - - - - - - - - " +
>                 "- - - - - - - - - - - - - - - -";
> 
>     /**
>      * Name of the database to connect to.
>      * Note that the JDBC protocol prefix is not supposed to be included.
>      */
>     private final String dbName;
>     /** Tell if we are connect through the embedded driver. */
>     private final boolean embedded;
>     /** SQL to extract space diagnostcs. */
>     private final String diagQuery;
>     /**
>      * Total counts for various space diagnostics.
>      * Currently: allocated, free, save, filesizes.
>      */
>     private long[] totalCounts = new long[4];
> 
>     /**
>      * Get space diagnostics for the specified database.
>      */
>     public DerbyDiskSpaceDiag(String dbName) {
>         this.dbName = dbName;
>         this.embedded = !dbName.startsWith("//");
>         this.diagQuery = 
>             "SELECT spaceinfo.* FROM " +
>             "SYS.SYSSCHEMAS s, SYS.SYSTABLES t, " +
>             DIAGTABLE + "(SCHEMANAME,TABLENAME) spaceinfo " +
>             "WHERE s.SCHEMAID = t.SCHEMAID";
>     }
> 
>     /**
>      * Get space diagnostics for the specified schema in the given database.
>      */
>     public DerbyDiskSpaceDiag(String dbName, String schema) {
>         this.dbName = dbName;
>         this.embedded = !dbName.startsWith("//");
>         this.diagQuery = 
>             "SELECT spaceinfo.* FROM " +
>             "SYS.SYSSCHEMAS s, SYS.SYSTABLES t, " +
>             DIAGTABLE + "('" + schema + "',TABLENAME) spaceinfo " +
>             "WHERE s.SCHEMAID = t.SCHEMAID AND " +
>             "s.SCHEMANAME = '" + schema + "'";
>     }
> 
>     /**
>      * Print disk space statistics.
>      */
>     public void printInfo()
>             throws SQLException {
>         Arrays.fill(this.totalCounts, 0L);
>         Connection con = DriverManager.getConnection(
>                 "jdbc:derby:" + this.dbName);
>         PreparedStatement conglomNrStmt = con.prepareStatement(CONGLOM_NUMBER);
>         PreparedStatement tableIdStmt = con.prepareStatement(TABLEID); 
>         Statement diagStmt = con.createStatement();
>         ResultSet diagRs = diagStmt.executeQuery(diagQuery);
>         ResultSet fileRs = null;
>         System.out.println(
>                 padRight("Tablename", 25) + " " +
>                 padRight("Type", 5) + " " +
>                 padLeft("ALLOC", 6) + " " +
>                 padLeft("FREE", 6) + " " +
>                 padLeft("PSIZE", 5) + " " +
>                 padLeft("SAVE", 6) + " " +
>                 padRight("File", 8) + " " +
>                 padLeft("Size KB", 8) + " ");
>         System.out.println(HORIZONTAL_LINE);
>         String conglomName = null;
>         boolean index = false;
>         Long tmpLong;
>         // Let's hope the order in the resultset doesn't change.
>         while (diagRs.next()) {
>             conglomName = diagRs.getString(1);
>             index = diagRs.getInt(2) > 0;
>             if (index) {
>                 System.out.print("  " + padRight(conglomName, 23) + " ");
>             } else {
>                 System.out.print(padRight(conglomName, 25) + " ");
>             }
>             System.out.print((index == true ? "INDEX" : "TABLE") + " ");
>             tmpLong = diagRs.getLong(3);
>             this.totalCounts[0] += tmpLong.longValue();
>             // Allocated
>             System.out.print(padLeft(tmpLong.toString(), 6) + " ");
>             tmpLong = diagRs.getLong(4);
>             this.totalCounts[1] += tmpLong.longValue();
>             // Free
>             System.out.print(padLeft(tmpLong.toString(), 6) + " ");
>             tmpLong = diagRs.getLong(6);
>             // Page size
>             System.out.print(padLeft(tmpLong.toString(), 5) + " ");
>             tmpLong = diagRs.getLong(7);
>             this.totalCounts[2] += tmpLong.longValue();
>             // Save
>             System.out.print(padLeft(tmpLong.toString(), 6) + " ");
>             if (embedded) {
>                 if (index) {
>                     conglomNrStmt.setString(1, conglomName);
>                     fileRs = conglomNrStmt.executeQuery();
>                 } else {
>                     // Must get table id fidiagRst.
>                     tableIdStmt.setString(1, conglomName);
>                     fileRs = tableIdStmt.executeQuery();
>                     fileRs.next();
>                     String tableId = fileRs.getString(1);
>                     fileRs.close(); 
>                     conglomNrStmt.setString(1, tableId);
>                     fileRs = conglomNrStmt.executeQuery();
>                 }
>                 if (fileRs.next()) {
>                     printFileInfo(dbName, fileRs.getInt(1));
>                     fileRs.close();
>                 } else {
>                     // Assume system table, which have '_HEAP' as postfix.
>                     conglomNrStmt.setString(1, conglomName + "_HEAP");
>                     fileRs = conglomNrStmt.executeQuery();
>                     if (fileRs.next()) {
>                         printFileInfo(dbName, fileRs.getInt(1));
>                         fileRs.close();
>                     } else {
>                         System.out.print("unable to retrieve");
>                     }
>                 }
>             } else {
>                 System.out.print("use embedded driver!");
> 
>             }
>             // End line
>             System.out.println();
>         }
>         System.out.println(HORIZONTAL_LINE);
>         System.out.println(
>                 padRight("Total", 25) + " " +
>                 padRight("", 5) + " " +
>                 padLeft(new Long(totalCounts[0]).toString(), 6) + " " +
>                 padLeft(new Long(totalCounts[1]).toString(), 6) + " " +
>                 padLeft("n/a", 5) + " " +
>                 padLeft(new Long(totalCounts[2]).toString(), 6) + " " +
>                 padRight("KB n/a", 8) + " " +
>                 padLeft(new Long(totalCounts[3]).toString(), 8) + " ");
>         System.out.println(HORIZONTAL_LINE);
>         con.close();
>     }
> 
>     /**
>      * Print file information.
>      * This will only work if connection is embedded.
>      */
>     private void printFileInfo(String dbName, int conglomNr)
>             throws SQLException {
>         File f = new File(
>                     new File(dbName, "seg0"),
>                     "c" + Integer.toHexString(conglomNr) + ".dat");
>         System.out.print(padRight(f.getName(), 8) + " ");
>         System.out.print(padLeft(new Long(f.length() / 1024).toString(), 8));
>         this.totalCounts[3] += (f.length() / 1024);
>     }
> 
>     /**
>      * Get disk space diagnostics table for a database.
>      *
>      * Arguments:
>      *  DBNAME : name/path of the database  /required/
>      *  SCHEMA : name of the schema         /optional/
>      */
>     public static void main(String[] args)
>             throws SQLException {
>         // If no arguments are given, print usage and exit.
>         if (args.length < 1) {
>             usage();
>             System.exit(1);
>         }
>         final String dbName = args[0];
>         loadDriver(dbName);
>         DerbyDiskSpaceDiag diag = null;
>         if (args.length > 1) {
>             // Limit extraction to the specified schema.
>             diag = new DerbyDiskSpaceDiag(dbName, args[1]);
>         } else {
>             diag = new DerbyDiskSpaceDiag(dbName);
>         }
>         diag.printInfo();
>     }
> } // End class DerbyDIskSpaceDiag


Mime
View raw message