Return-Path: Delivered-To: apmail-db-derby-dev-archive@www.apache.org Received: (qmail 53138 invoked from network); 10 Jun 2005 17:32:32 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 10 Jun 2005 17:32:32 -0000 Received: (qmail 38314 invoked by uid 500); 10 Jun 2005 17:32:31 -0000 Delivered-To: apmail-db-derby-dev-archive@db.apache.org Received: (qmail 38287 invoked by uid 500); 10 Jun 2005 17:32:31 -0000 Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: List-Id: Reply-To: "Derby Development" Delivered-To: mailing list derby-dev@db.apache.org Received: (qmail 38268 invoked by uid 99); 10 Jun 2005 17:32:31 -0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=DNS_FROM_RFC_ABUSE,DNS_FROM_RFC_POST,SPF_HELO_FAIL X-Spam-Check-By: apache.org Received-SPF: neutral (hermes.apache.org: local policy) Received: from e32.co.us.ibm.com (HELO e32.co.us.ibm.com) (32.97.110.130) by apache.org (qpsmtpd/0.28) with ESMTP; Fri, 10 Jun 2005 10:32:27 -0700 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e32.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j5AHWA9q719002 for ; Fri, 10 Jun 2005 13:32:10 -0400 Received: from [127.0.0.1] (sig-9-48-107-251.mts.ibm.com [9.48.107.251]) by westrelay02.boulder.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j5AHW8Qm234082 for ; Fri, 10 Jun 2005 11:32:08 -0600 Message-ID: <42A9CE92.7070808@sbcglobal.net> Date: Fri, 10 Jun 2005 10:32:02 -0700 From: Mike Matrigali User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Derby Development Subject: Re: [PATCH] DERBY-302 Insert on clob of 500k of data takes long time References: <42A9C477.7020909@gmail.com> In-Reply-To: <42A9C477.7020909@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N all the code looks right to me, but I have some questions that may be nit performance issues. 1) should most of the new routines you added be "final" - except for the SQLChar one which gets overridden. 2) There are a lot of numbers rather than names for numbers, I realize you are following what was there. but just a little hard to read and if any of the numbers relate to each other it is hard to tell. 3) and sort of part of 2 - should the grow by number relate to the number which decides when to return a buffer back to the VM rather than cache the buffer in the datatype. This code is around because on a select of 1 million rows derby will only allocate one SQLChar per CHAR collumn - not 1 million. I think the case I am wondering about is a VARCHAR bigger than 64 bytes but less than 4k. Does grow by 4k mean the varchar buffer will start at 64 and then go to 64+4k - if so seems like we should cache the buffer at 64+4k not 4k. Sunitha Kambhampati wrote: > DERBY-302 - Insert on Clob of 500k of data using streams takes long > time. It takes 3 mins on a sun jvm1.4.2 and 20 seconds with ibm jvm 1.4.2. > The following fix improves the performance of insert into a 500k blob > from 20 seconds to around 1 second. Note that by changing just the test > program time was reduced from 3 minutes to avg around 20 seconds. > > Currently in derby, for an insert on a clob using setCharacterStream > what will happen is , the entire stream will be materialized into a char > array and sent to store for the insert. ( we should not have to stream > here. I will file another jira issue for this and put in all information > I learnt) > > Given this is how inserts for large clobs are happening, the performance > issue analysis is as follows: > -- profiler run shows that most time is spent in SQLChar.readExternal > which is where the materialization into a char array for the user's > input stream happens. The growth of this array happens gradually till > the entire stream is materialized into the array. Below code snippet > shows by how much the array is grown each time when it realizes it has > to read more bytes from the stream. > > The dvd hierarchy for clob is - SQLClob ( dvd) extends SQLVarChar > extends SQLChar. > > So in SQLChar.readExternal > ........ > int growby = in.available(); > if(growby < 64) > growby = 64 > and then an allocation and an arraycopy to the new allocated array. > > -- In the code snippet, 'in' is the wrapper around the user's stream > which is ReaderToUTF8Stream . ReaderToUTF8Stream extends InputStream > and does not override available() method . As per the spec, > InputStream.available() returns 0. > > -- Thus each time, the array growth is by 64 bytes which is obviously > not performant. so for a 500k clob insert, this would mean allocation & > arraycopy steps happen ~8000 times. > > -- The ReaderToUTF8Stream that has the user's stream reads from the > stream and does the utf8 conversion and puts it in a 4k array. I think > it is reasonable to have a 32k buffer to store this information for clobs. > > Although I think there seems to be more possible optimizations in this > area, I prefer the incremental approach too :) so this patch is a > first step towards fixing the insert clob performance in the current > system. > > Fix includes: > -- enhanced the way the array was grown to keep the original 64 bytes > for char ( seems reasonable given the upper limit for char) but override > it to have 4k for varchar and clobs. > -- override available() in ReaderToUTF8Stream to return a better > estimate of how many bytes can be read. > > svn stat > M java\engine\org\apache\derby\impl\jdbc\ReaderToUTF8Stream.java > M java\engine\org\apache\derby\iapi\services\io\LimitReader.java > M java\engine\org\apache\derby\iapi\types\SQLChar.java > M java\engine\org\apache\derby\iapi\types\SQLVarchar.java > > -- ran derbyall ok with sun jvm. > > -- I can add a test and compare times but I realize that is probably > not the best solution here. It should ideally be part of a performance > regression suite. > > Numbers for clob inserts in seconds for one insert on my laptop - as > per the jira issue. > With fix , times are in seconds for 1 insert on a clob on my laptop > (windows, 1G ram, 1.6Ghz Intel Pentium(M) ) > > FileSize ibm jvm 1.4.2 sun jvm 1.4.2 sun jvm 1.5 > 500k 0.9s 1.6s 1.7s > 1M 2.1s 4s 5s > 2M 3s 9s > 11s > 4M 7s 18s > 22s > > > Without the fix, 500k with sun jvm takes 3 mins and ibm jvm takes 20 > seconds. > I will add the test program along with the input files to jira issue. > _________________________ > Without this fix : As I already mentioned in the jira comment for > derby302, I changed the program in the attached jira entry to use > BufferedReader with the buffersize set to a bigger value than the > default ( to 64k) brought down the times for sun jvm closer to ibm jvm. > I noticed that in my test, if I ran the test multiple times and did > multiple inserts the performance of sun jvm and ibm jvm for 500k clob > was around 20 seconds - guess the jit kicks in , plus the OS cache may > also be a factor.. > ________________________ > Please review it and if there are no comments, can a committer please > commit it. > > Thanks, > Sunitha. > > > ------------------------------------------------------------------------ > > Index: java/engine/org/apache/derby/impl/jdbc/ReaderToUTF8Stream.java > =================================================================== > --- java/engine/org/apache/derby/impl/jdbc/ReaderToUTF8Stream.java (revision 189773) > +++ java/engine/org/apache/derby/impl/jdbc/ReaderToUTF8Stream.java (working copy) > @@ -47,7 +47,7 @@ > ReaderToUTF8Stream(LimitReader reader) > { > this.reader = reader; > - buffer = new byte[4096]; > + buffer = new byte[32768];// seems reasonable to do 32k reads > blen = -1; > } > > @@ -196,5 +196,19 @@ > reader.close(); > } > > + /** > + * Return an optimized version of bytes available to read from > + * the stream > + */ > + public int available() > + { > + int remainingBytes = reader.getLimit(); > + // this object buffers (32k - 6)bytes that can be read > + // and when that is finished it reads the next available bytes > + // from the reader object > + // reader.getLimit() returns the remaining bytes available > + // on this stream > + return (32768 > remainingBytes ? remainingBytes : 32768); > + } > } > > Index: java/engine/org/apache/derby/iapi/services/io/LimitReader.java > =================================================================== > --- java/engine/org/apache/derby/iapi/services/io/LimitReader.java (revision 189773) > +++ java/engine/org/apache/derby/iapi/services/io/LimitReader.java (working copy) > @@ -119,6 +119,16 @@ > limitInPlace = true; > return; > } > + > + /** > + * return limit of the stream that can be read without throwing > + * EOFException > + * @return the remaining bytes left to be read from the stream > + */ > + public int getLimit() > + { > + return remainingBytes; > + } > > /** > Clear any limit set by setLimit. After this call no limit checking > Index: java/engine/org/apache/derby/iapi/types/SQLChar.java > =================================================================== > --- java/engine/org/apache/derby/iapi/types/SQLChar.java (revision 189773) > +++ java/engine/org/apache/derby/iapi/types/SQLChar.java (working copy) > @@ -641,6 +641,11 @@ > int utflen = in.readUnsignedShort(); > > int requiredLength; > + // minimum amount that is reasonable to grow the array > + // when we know the array needs to growby at least one > + // byte but we dont want to grow by one byte as that > + // is not performant > + int minGrowBy = growBy(); > if (utflen != 0) > { > // the object was not stored as a streaming column > @@ -654,8 +659,8 @@ > // OR > // The original string was a 0 length string. > requiredLength = in.available(); > - if (requiredLength < 64) > - requiredLength = 64; > + if (requiredLength < minGrowBy) > + requiredLength = minGrowBy; > } > > char str[]; > @@ -707,12 +712,26 @@ > if (strlen >= arrayLength) // the char array needs to be grown > { > int growby = in.available(); > - > - // We know at the array needs to be grown by at least one. > + // We know that the array needs to be grown by at least one. > // However, even if the input stream wants to block on every > // byte, we don't want to grow by a byte at a time. > - if (growby < 64) > - growby = 64; > + // Note, for large data (clob > 32k), it is performant > + // to grow the array by atleast 4k rather than a small amount > + // Even better maybe to grow by 32k but then may be > + // a little excess(?) for small data. > + // hopefully in.available() will give a fair > + // estimate of how much data can be read to grow the > + // array by larger and necessary chunks. > + // This performance issue due to > + // the slow growth of this array was noticed since inserts > + // on clobs was taking a really long time as > + // the array here grew previously by 64 bytes each time > + // till stream was drained. (Derby-302) > + // for char, growby 64 seems reasonable, but for varchar > + // clob 4k or 32k is performant and hence > + // growBy() is override correctly to ensure this > + if (growby < minGrowBy) > + growby = minGrowBy; > > int newstrlength = arrayLength + growby; > char oldstr[] = str; > @@ -804,6 +823,18 @@ > cKey = null; > } > > + /** > + * returns the reasonable minimum amount by > + * which the array can grow . See readExternal. > + * when we know that the array needs to grow by at least > + * one byte, it is not performant to grow by just one byte > + * instead this amount is used to provide a resonable growby size. > + * @return minimum reasonable growby size > + */ > + protected int growBy() > + { > + return 64; //seems reasonable for a char > + } > /** > * @see Storable#restoreToNull > * > Index: java/engine/org/apache/derby/iapi/types/SQLVarchar.java > =================================================================== > --- java/engine/org/apache/derby/iapi/types/SQLVarchar.java (revision 189773) > +++ java/engine/org/apache/derby/iapi/types/SQLVarchar.java (working copy) > @@ -182,4 +182,17 @@ > { > return TypeId.VARCHAR_PRECEDENCE; > } > + > + /** > + * returns the reasonable minimum amount by > + * which the array can grow . See readExternal. > + * when we know that the array needs to grow by at least > + * one byte, it is not performant to grow by just one byte > + * instead this amount is used to provide a resonable growby size. > + * @return minimum reasonable growby size > + */ > + protected int growBy() > + { > + return 4096; //seems reasonable for a varchar or clob > + } > }